Exclusive: Gemini's data-analyzing abilities aren't as good as Google claims

Google's flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, have been touted for their ability to process and analyze large amounts of data, but recent studies suggest they struggle with tasks requiring long context. The models were tested on their ability to answer questions about large datasets, such as summarizing multiple hundred-page documents or searching across scenes in film footage, and only gave the correct answer 40-50% of the time. The studies also found that the models had difficulty verifying claims that required considering larger portions of a book or implicit information not explicitly stated in the text.

The studies have not been peer-reviewed and did not test the latest releases of the Gemini models, but they add to concerns that Google has been overpromising the capabilities of its AI models. Other models tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, also performed poorly. The researchers suggest that better benchmarks and more third-party critique are needed to counter hyped-up claims about generative AI.

Key takeaways:

Google's flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, have been found to struggle with processing and analyzing large amounts of data, contrary to the company's claims.
Two separate studies found that the models were unable to answer questions about large datasets correctly, with one series of document-based tests showing the models gave the right answer only 40% to 50% of the time.
Researchers found that the models had difficulty verifying claims that required considering larger portions of a book or the entire book, and struggled with verifying claims about implicit information not explicitly stated in the text.
Despite these findings, Google continues to advertise the models' context window as a key selling point, leading to accusations that the company is overpromising and under-delivering with Gemini.

Exclusive: Gemini's data-analyzing abilities aren't as good as Google claims

Key takeaways:

Comments (0)

Newsletter