Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Study claims ChatGPT is losing capability, but some experts aren’t convinced

Jul 20, 2023 - arstechnica.com
Researchers from Stanford University and University of California, Berkeley have published a study suggesting that OpenAI's large language models, specifically GPT-3.5 and GPT-4, have shown inconsistent performance over time. The study, which tested the March and June 2023 versions of these models on tasks like math problem-solving and code generation, found that GPT-4's ability to identify prime numbers dropped from an accuracy of 97.6% in March to just 2.4% in June. However, GPT-3.5 showed improved performance in the same period.

The study supports the belief that GPT-4's performance has declined over the past few months, a claim that OpenAI has consistently denied. Some experts argue that the study's findings don't conclusively prove a decline in GPT-4's performance and could be consistent with fine-tuning adjustments made by OpenAI. For instance, the study was criticized for evaluating the immediacy of the code's ability to be executed rather than its correctness.

Key takeaways:

  • A research paper from Stanford University and University of California, Berkeley suggests that the AI language model GPT-4 has become less effective at coding and compositional tasks over time.
  • The researchers tested the March and June 2023 versions of GPT-4 and GPT-3.5 on tasks like math problem-solving, answering sensitive questions, code generation, and visual reasoning. They found that GPT-4's ability to identify prime numbers dropped from 97.6 percent accuracy in March to just 2.4 percent in June.
  • OpenAI has denied claims that GPT-4's capabilities have decreased, with VP of Product Peter Welinder stating that each new version is smarter than the previous one.
  • Princeton computer science professor Arvind Narayanan criticized the study's methodology, arguing that it didn't evaluate the correctness of the code generated by GPT-4, but rather its immediacy of execution.
View Full Article

Comments (0)

Be the first to comment!