The newsletter also covers various AI news, including OpenAI’s shift towards “intellectual freedom” in its development approach, former OpenAI CTO Mira Murati’s new startup, and Meta’s upcoming LlamaCon conference. Additionally, OpenAI researchers have developed a new benchmark, SWE-Lancer, to evaluate AI coding capabilities, while Chinese AI company Stepfun released Step-Audio, a multilingual speech model. Nous Research introduced DeepHermes-3 Preview, a model that combines reasoning with language capabilities, with similar models expected from Anthropic and OpenAI.
Key takeaways:
- Elon Musk's AI startup, xAI, released its latest AI model, Grok 3, which outperforms other leading models on benchmarks for mathematics and programming.
- There is ongoing debate about the effectiveness and relevance of AI benchmarks, with calls for better testing methods and independent authorities.
- OpenAI researchers developed a new benchmark, SWE-Lancer, to evaluate AI coding capabilities, revealing that AI models still have room for improvement.
- Chinese AI company Stepfun released Step-Audio, an AI model capable of understanding and generating speech in multiple languages, with adjustable emotions and dialects.