Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

GPT and Other AI Models Can't Analyze an SEC Filing, Researchers Find - Slashdot

Dec 21, 2023 - slashdot.org
Researchers from Patronus AI have found that large language models, including OpenAI's GPT-4-Turbo, often fail to accurately answer questions derived from Securities and Exchange Commission (SEC) filings. Even when given access to nearly an entire filing, GPT-4-Turbo only answered correctly 79% of the time on Patronus AI's new test, FinanceBench. The AI models sometimes refused to answer or provided incorrect data not present in the filings. The company's co-founder, Anand Kannappan, stated that this performance rate is unacceptable for automation and production-ready applications.

Patronus AI tested four language models using a subset of 150 questions from their FinanceBench dataset, which includes over 10,000 questions and answers from SEC filings of major publicly traded companies. GPT-4-Turbo failed to answer 88% of the questions without access to any SEC source document, but improved significantly when given access to the filings. However, even in "Oracle" mode, where it was pointed to the exact text for the answer, it still produced an incorrect answer 15% of the time. Other models, such as Meta's Llama 2 and Anthropic's Claude 2, also struggled with accuracy, with Llama 2 producing incorrect answers 70% of the time.

Key takeaways:

  • Chatbots relying on large language models often fail to accurately answer questions derived from SEC filings, according to researchers from Patronus AI.
  • Patronus AI created a test called FinanceBench with over 10,000 questions and answers from SEC filings of major publicly traded companies to evaluate the performance of these AI models.
  • OpenAI's GPT-4-Turbo, even when given nearly an entire filing to read, only answered 79% of the questions correctly, while Meta's Llama 2 produced incorrect answers 70% of the time.
  • Anthropic's Claude 2 performed well when given 'long context', answering 75% of the questions correctly, while GPT-4-Turbo improved significantly when given access to the underlying filings.
View Full Article

Comments (0)

Be the first to comment!