Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Scale AI to set the Pentagon’s path for testing and evaluating large language models

Feb 21, 2024 - defensescoop.com
The Pentagon’s Chief Digital and Artificial Intelligence Office (CDAO) has contracted Scale AI to develop a reliable method for testing and evaluating large language models that could support military planning and decision-making. The one-year contract aims to provide the CDAO with a framework to deploy AI safely, offering real-time feedback for warfighters and creating specialized public sector evaluation sets to test AI models for military support applications.

Scale AI will adopt a similar approach to test and evaluate large language models, but due to their generative nature and the complexities of the English language, the process will be iterative and involve creating "holdout datasets". The ultimate goal is for models to send signals to CDAO officials if they start to deviate from the domains they have been tested against. The company has previously partnered with Meta, Microsoft, the U.S. Army, the Defense Innovation Unit, OpenAI, General Motors, Toyota Research Institute, Nvidia, and others.

Key takeaways:

  • The Pentagon’s Chief Digital and Artificial Intelligence Office (CDAO) has contracted Scale AI to develop a reliable method for testing and evaluating large language models that can support military planning and decision-making.
  • The outcomes of this one-year contract will provide the CDAO with a framework to deploy AI safely, offering real-time feedback for warfighters and creating specialized public sector evaluation sets to test AI models for military support applications.
  • Scale AI will adopt a similar approach for testing and evaluating large language models, but due to their generative nature and the complexity of the English language, the process will be more challenging and will involve creating "holdout datasets".
  • Scale AI's work will help the DOD mature its test-and-evaluation policies to address generative AI, identifying models that are ready to support military applications with accurate and relevant results using DoD terminology and knowledge bases.
View Full Article

Comments (0)

Be the first to comment!