TechCrunch highlighted the existing benchmarking problem in AI, stating that current benchmarks do not accurately reflect how the average person uses these systems. Anthropic's solution is to create challenging benchmarks focused on AI security and societal implications, using new tools, infrastructure, and methods.
Key takeaways:
- AI company Anthropic has initiated a funding program to develop new benchmarks for evaluating AI models, including its chatbot Claude.
- The program will pay third-party organizations to create metrics for assessing advanced AI capabilities.
- Anthropic's goal with this investment is to improve the entire field of AI safety.
- The company proposes to create challenging benchmarks focusing on AI security and societal implications using new tools, infrastructure, and methods.