GPT and other AI models can't analyze an SEC filing, researchers find

Patronus AI, a startup founded by Anand Kannappan and Rebecca Qian, has found that large language models (LLMs) like OpenAI's GPT-4-Turbo often fail to accurately answer questions derived from Securities and Exchange Commission filings. Even when given nearly an entire filing to read alongside the question, GPT-4-Turbo only answered correctly 79% of the time in Patronus AI's test. The company's founders argue that this performance rate is unacceptable for automated, production-ready applications, especially in regulated industries like finance.

The startup has developed a set of over 10,000 questions and answers drawn from SEC filings, called FinanceBench, to test the performance of language AI in the financial sector. In tests, GPT-4-Turbo failed to answer 88% of the questions when not given access to any SEC source document, but improved significantly when given access to the underlying filings. Other models tested, including Meta's Llama 2 and Anthropic's Claude 2, also struggled with accuracy. Despite these challenges, the founders of Patronus AI believe there is huge potential for LLMs in the finance industry if AI continues to improve.

Key takeaways:

Patronus AI, a startup founded by Anand Kannappan and Rebecca Qian, found that large language models often fail to answer questions derived from SEC filings, even when using OpenAI's GPT-4-Turbo.
The company developed a test called FinanceBench, consisting of over 10,000 questions and answers drawn from SEC filings, to evaluate the performance of AI models in the financial sector.
When tested, the AI models often failed to answer or produced incorrect answers. For instance, GPT-4-Turbo failed to answer 88% of the 150 questions it was asked in a "closed book" test, and Llama 2, an AI model developed by Meta, produced wrong answers 70% of the time.
Despite the current shortcomings, the co-founders of Patronus AI believe that there is huge potential for AI models to assist in the finance industry, provided they continue to improve.

GPT and other AI models can't analyze an SEC filing, researchers find

Key takeaways:

Comments (0)

Newsletter