Real AI model benchmarks from authoritative sources
With the explosion of AI models released every week—often with exaggerated claims—it's hard to find objective, trustworthy performance data. AgentLeaderboards aggregates real benchmark scores from authoritative sources to help you make informed decisions about which AI model to use.
The gold standard for open-source LLM evaluation. We reference their benchmarks including:
Curated by AI Explained, featuring benchmarks from Epoch AI and Scale AI:
We also track community benchmarks and user ratings from:
Our "Overall Score" is a weighted average of performance across multiple benchmarks, normalized to a 0-100 scale. We prioritize:
We update benchmarks daily from public leaderboards. When a new model is released, it typically appears within 24-48 hours of being evaluated by authoritative sources.
All pricing information is sourced from official provider documentation and updated regularly. Prices are shown as input/output cost per million tokens (USD).
What we do: Aggregate and present real benchmark data from trusted sources.
What we don't do: Run our own benchmarks or modify scores.
Limitations: Benchmarks are proxies for real-world performance and may not reflect your specific use case. Always test models on your own data before making critical decisions.
Bias acknowledgment: Benchmark datasets may contain biases. We display multiple benchmarks to provide a more complete picture.
AgentLeaderboards is an EngineeredEverything project.
Found an error? Have a suggestion? Want to submit a benchmark?
Contact us at: [email protected]