There has been some controversy regarding the benchmarking of these models. Meta claimed that its Maverick model outperformed ChatGPT-4o, but it was revealed that the model submitted for testing was a customized version optimized for conversationality. This led to criticism from LMArena, a benchmarking platform, about Meta's transparency. Despite this, the Maverick experimental model is currently ranked second on LMArena, tied with GPT-4o and Grok 3, while Google's Gemini 2.5 Pro holds the top spot. More details about the Llama 4 family, including additional models, are expected to be announced at LlamaCon, Meta's upcoming AI developers conference.
Key takeaways:
- Meta has released two new AI models, Maverick and Scout, as part of its Llama 4 family, which are open-weights and multimodal.
- There is controversy over Meta's benchmarking practices, as the model submitted to LMArena was optimized for conversationality, potentially skewing results.
- Scout is a smaller model with 17 billion parameters and 16 experts, while Maverick is a midsized model with 17 billion parameters and 128 experts.
- Meta's Llama Maverick experimental model is currently ranked second on LMArena, tied with GPT-4o and Grok 3, while Google's Gemini 2.5 Pro holds the first position.