Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

benchmark/README.md at main · getomni-ai/benchmark

Apr 01, 2025 - github.com
The Omni OCR Benchmark is a tool designed to evaluate and compare the OCR and data extraction capabilities of various large multimodal models, including gpt-4o. It aims to provide a comprehensive benchmark of OCR accuracy across traditional OCR providers and multimodal language models, focusing on both text and JSON extraction accuracy. The benchmark uses open-source evaluation datasets and methodologies, encouraging expansion to include additional providers. Key evaluation metrics include JSON accuracy, measured by a modified json-diff, and text similarity, assessed using Levenshtein distance.

The benchmark process involves cloning the repository, preparing test data, setting up API keys for the models to be tested, and running the benchmark to generate results. Supported models include both closed-source and open-source LLMs, as well as cloud OCR providers, each requiring specific environment variables for operation. The project also offers a benchmark dashboard for visualizing test results and is licensed under the MIT License.

Key takeaways:

  • The Omni OCR Benchmark is a tool for comparing OCR and data extraction capabilities of various large multimodal models, focusing on text and JSON extraction accuracy.
  • The benchmark uses open-source evaluation datasets and methodologies, encouraging expansion to include additional providers.
  • JSON accuracy is measured using a modified json-diff, while text similarity is evaluated using Levenshtein distance.
  • The benchmark supports both open-source and closed-source LLMs, as well as cloud OCR providers, with specific models and required environment variables listed for each.
View Full Article

Comments (0)

Be the first to comment!