Scale AI is making a test-and-evaluation (T&E) plan for the Pentagon’s large language models (LLMs). The project aims to make sure AI models are safe and reliable for military use.
The Pentagon’s Chief Digital and Artificial Intelligence Office (CDAO) needs a way to test and evaluate AI models for military use. The CDAO wants to use LLMs to support and improve military planning and decision-making. However, LLMs can also disrupt these processes.
The Pentagon has used T&E processes for a long time to ensure its systems, platforms, and technologies work well. But, AI safety standards and policies are not yet set. The complexities and uncertainties of LLMs make T&E even harder for generative AI.
How will It work?
Scale AI will create a framework for the CDAO to test and evaluate LLMs. The T&E process will include creating “holdout datasets” where DOD insiders will prompt response pairs and review them in layers. The experts will ensure that each response is as good as a human’s response in the military.
The process will be iterative, and once the datasets are ready, the experts will evaluate existing LLMs against them. Eventually, the models will send signals to CDAO officials if they start to waver from the domains they have been tested against.
The goal of the Pentagon
The goal is to enhance the robustness and resilience of AI systems in classified environments. This will enable the adoption of LLM technology in secure environments. The company plans to automate as much of the development process as possible. This way, as new models come in, there can be some baseline understanding of how they will perform, where they will perform best, and where they will probably start to fail.
Benefits of the partnership
The partnership between Scale AI and the DoD is a significant step towards ensuring the safe and responsible deployment of LLMs and generative AI within the military. The T&E framework will help the DoD understand the strengths and limitations of the technology. It will also ensure that the models are reliable, safe, and effective for military applications.
Scale AI’s CEO, Alexandr Wang, said, “Testing and evaluating generative AI will help the DoD understand the strengths and limitations of the technology, so it can be deployed responsibly. Scale is honored to partner with the DoD on this framework.”
Apart from the CDAO, Scale AI has partnered with Meta, Microsoft, the U.S. Army, the Defense Innovation Unit, OpenAI, General Motors, Toyota Research Institute, Nvidia, and others. These partnerships show Scale AI’s commitment to ensuring the safe and responsible deployment of AI technology.
The partnership between Scale AI and the Pentagon is a big step. It is towards ensuring the safe use of LLMs and generative AI in the military. The T&E framework will help the DoD understand the technology’s strengths and limits. It will also make sure the models are reliable, safe, and effective. This is for military use. With Scale AI’s expertise and the Pentagon’s need for T&E, this partnership is a win-win for both parties.
A Step-By-Step System To Launching Your Web3 Career and Landing High-Paying Crypto Jobs in 90 Days.