Patronus AI's research has revealed significant deficiencies in leading LLMs' ability to accurately answer fact-based questions. The company's "FinanceBench" benchmark found that the best performing model answered only 19% of financial queries correctly after reading an entire annual report. Another experiment with the company's "CopyrightCatcher" API found that open-source LLMs reproduced copyrighted text verbatim in 44% of outputs. With the new funding, Patronus AI plans to expand its research, engineering, and sales teams and develop additional industry benchmarks.
Key takeaways:
- Patronus AI, a San Francisco startup, has raised $17 million in Series A funding to develop an automated evaluation platform that can detect errors in large language models (LLMs).
- The platform uses proprietary AI to identify issues such as hallucinations, copyright infringement, and safety violations in LLM outputs.
- Patronus AI's research has revealed significant deficiencies in leading models' ability to accurately answer questions grounded in fact, with the best performing model answering only 19% of financial queries correctly.
- With the new funding, Patronus plans to expand its research, engineering, and sales teams and develop additional industry benchmarks, aiming to make automated evaluation of LLMs a standard requirement for enterprises deploying the technology.