The benchmark process involves cloning the repository, preparing test data, setting up API keys for the models to be tested, and running the benchmark to generate results. Supported models include both closed-source and open-source LLMs, as well as cloud OCR providers, each requiring specific environment variables for operation. The project also offers a benchmark dashboard for visualizing test results and is licensed under the MIT License.
Key takeaways:
- The Omni OCR Benchmark is a tool for comparing OCR and data extraction capabilities of various large multimodal models, focusing on text and JSON extraction accuracy.
- The benchmark uses open-source evaluation datasets and methodologies, encouraging expansion to include additional providers.
- JSON accuracy is measured using a modified json-diff, while text similarity is evaluated using Levenshtein distance.
- The benchmark supports both open-source and closed-source LLMs, as well as cloud OCR providers, with specific models and required environment variables listed for each.