Why has LLM progress seemingly stalled around the GPT-4 level? Or has it?

The article discusses the vast amount of data that AI models like GPT-4 are trained on, suggesting that it could even handle the entire world wide web. The author speculates that OpenAI employs many people to generate new data, including solving questions to be fed into the AI. The author also mentions the practice of adding more data points to handle content that was poorly received.

However, the author points out that there are still many gaps in the data, using the consistent mistranslation of the word "you" in Indonesian by various AI models as an example. The author suggests that these models might have been trained on government documents or advertisements, leading to such errors. The author emphasizes the need for better benchmarks, as current ones do not catch these issues, and are mostly focused on Python for code benchmarks.

Key takeaways:

OpenAI likely employs a large number of people to generate new data for GPT-4, including solving questions to be fed into the system.
Quality of data is crucial, and feeding the system low-quality data, such as from private conversations, may not yield the best results.
There are significant gaps in the data, as evidenced by consistent mistranslations of certain words in languages like Indonesian.
Current benchmarks may not be sufficient to catch all bugs and issues, indicating a need for better benchmarks in the future.

Why has LLM progress seemingly stalled around the GPT-4 level? Or has it?

Key takeaways:

Comments (0)

Newsletter