LLM Leaderboard 2024

The article provides a comprehensive comparison of commercial and open-source Language Learning Models (LLMs) based on their capabilities, price, and context window. The models are evaluated on various benchmarks such as math, science, reasoning, and coding. The models with the largest context window are Claude 2.1, GPT-4 Turbo, and Gemini Pro 1.5, while the ones with the lowest input cost per 1M tokens are Gemini Pro, Mistral Tiny, and GPT 3.5 Turbo.

The article also presents a separate leaderboard for code generation, with Claude 3 Opus, Claude 3 Haiku, and Claude 3 Sonnet leading. A comparison of proprietary models' cost and context window is also provided, with Claude 3 Haiku having the lowest cost per 1M tokens. The data for these comparisons is sourced from the models' technical reports and public product/pricing pages.

Key takeaways:

The leaderboard compares various commercial and open-source Language Learning Models (LLMs) based on their capabilities, price, and context window.
The models with the largest context windows are Claude 2.1, GPT-4 Turbo, and Gemini Pro 1.5. The models with the lowest input cost per 1M tokens are Gemini Pro, Mistral Tiny, and GPT 3.5 Turbo.
The comparison includes various benchmarks such as math, science, reasoning, and coding. Claude 3 Opus, Gemini 1.5 Pro, and GPT-4 are among the top performers in these benchmarks.
The data used for the leaderboard is sourced from the technical reports of the models and public product/pricing pages.

LLM Leaderboard 2024

Key takeaways:

Comments (0)

Newsletter