The findings reveal that Gemini Pro's accuracy is slightly inferior to GPT 3.5 Turbo in all benchmarked tasks. The under-performance is attributed to failures in mathematical reasoning with many digits, sensitivity to multiple-choice answer ordering, and aggressive content filtering. However, Gemini showed high performance in generating non-English languages and handling longer and more complex reasoning chains.
Key takeaways:
- The Google Gemini models are the first to report results that rival the OpenAI GPT series across a wide variety of tasks.
- A third-party, objective comparison of the abilities of the OpenAI GPT and Google Gemini models was conducted, with reproducible code and fully transparent results.
- Gemini Pro's accuracy is close but slightly inferior to the corresponding GPT 3.5 Turbo on all tasks that were benchmarked.
- Gemini demonstrates high performance in generation into non-English languages, and handling longer and more complex reasoning chains.