A new, challenging AGI test stumps most AI models

The Arc Prize Foundation, co-founded by AI researcher François Chollet, has introduced a new test called ARC-AGI-2 to evaluate the general intelligence of AI models. This test challenges AI systems to identify visual patterns and generate correct answers without relying on brute force computing power. Current AI models, including OpenAI’s o1-pro and DeepSeek’s R1, score between 1% and 1.3%, while non-reasoning models like GPT-4.5 score around 1%. In contrast, human panels average 60% accuracy. ARC-AGI-2 emphasizes efficiency in acquiring and deploying skills, addressing flaws in the previous ARC-AGI-1 test, which allowed models to rely heavily on computing power.

The introduction of ARC-AGI-2 comes amid calls for new benchmarks to measure AI progress, particularly in traits like creativity. The Arc Prize Foundation has also launched a contest challenging developers to achieve 85% accuracy on ARC-AGI-2 with a cost constraint of $0.42 per task. This initiative highlights the ongoing need for effective measures of AI capabilities beyond mere problem-solving, focusing on the efficiency and cost of skill acquisition.

Key takeaways:

The Arc Prize Foundation has introduced a new test, ARC-AGI-2, to measure AI models' general intelligence, which has proven challenging for most models.
ARC-AGI-2 focuses on efficiency and the ability to interpret patterns on the fly, addressing flaws in the previous ARC-AGI-1 test.
Human participants averaged 60% accuracy on ARC-AGI-2, significantly outperforming AI models, which scored between 1% and 4%.
The Arc Prize Foundation announced a new contest, challenging developers to achieve 85% accuracy on ARC-AGI-2 with a cost constraint of $0.42 per task.

A new, challenging AGI test stumps most AI models | TechCrunch

Key takeaways:

Comments (0)

Newsletter