Study claims ChatGPT is losing capability, but some experts aren’t convinced

Researchers from Stanford University and University of California, Berkeley have published a study suggesting that OpenAI's large language models, specifically GPT-3.5 and GPT-4, have shown inconsistent performance over time. The study, which tested the March and June 2023 versions of these models on tasks like math problem-solving and code generation, found that GPT-4's ability to identify prime numbers dropped from an accuracy of 97.6% in March to just 2.4% in June. However, GPT-3.5 showed improved performance in the same period.

The study supports the belief that GPT-4's performance has declined over the past few months, a claim that OpenAI has consistently denied. Some experts argue that the study's findings don't conclusively prove a decline in GPT-4's performance and could be consistent with fine-tuning adjustments made by OpenAI. For instance, the study was criticized for evaluating the immediacy of the code's ability to be executed rather than its correctness.

Key takeaways:

A research paper from Stanford University and University of California, Berkeley suggests that the AI language model GPT-4 has become less effective at coding and compositional tasks over time.
The researchers tested the March and June 2023 versions of GPT-4 and GPT-3.5 on tasks like math problem-solving, answering sensitive questions, code generation, and visual reasoning. They found that GPT-4's ability to identify prime numbers dropped from 97.6 percent accuracy in March to just 2.4 percent in June.
OpenAI has denied claims that GPT-4's capabilities have decreased, with VP of Product Peter Welinder stating that each new version is smarter than the previous one.
Princeton computer science professor Arvind Narayanan criticized the study's methodology, arguing that it didn't evaluate the correctness of the code generated by GPT-4, but rather its immediacy of execution.

Study claims ChatGPT is losing capability, but some experts aren’t convinced

Key takeaways:

Comments (0)

Newsletter