Gemini - Google DeepMind

The markdown data presents a comparison of various AI models' performance on different tasks. In the field of multi-discipline college-level reasoning problems, Gemini Ultra (pixel only) achieved a 59.4% 0-shot pass@1, while GPT-4V scored 56.8%. In natural image understanding, Gemini Ultra (pixel only) scored 77.8% 0-shot, slightly outperforming GPT-4V's 77.2%. Gemini Ultra (pixel only) also led in OCR on natural images, document understanding, and infographic understanding, with scores of 82.3%, 90.9%, and 80.3% respectively. In mathematical reasoning in visual contexts, Gemini Ultra (pixel only) scored 53% 0-shot, slightly higher than GPT-4V's 49.9%.

In video-related tasks, Gemini Ultra scored 62.74-shot in English video captioning, outperforming DeepMind Flamingo's 564-shot. For video question answering, Gemini Ultra scored 54.7% 0-shot, higher than SeViLA's 46.3%. In the audio domain, Gemini Pro scored 40.1 in automatic speech translation, significantly higher than Whisper v2's 29.1. For automatic speech recognition, Gemini Pro achieved a 7.6% word error rate, which is better than Whisper v3's 17.6%.

Key takeaways:

Gemini Ultra (pixel only) outperforms GPT-4V in multi-discipline college-level reasoning problems, natural image understanding, OCR on natural images, document understanding, and infographic understanding.
In mathematical reasoning in visual contexts, Gemini Ultra (pixel only) also performs better than GPT-4V.
For English video captioning, Gemini Ultra outperforms DeepMind Flamingo. In video question answering, Gemini Ultra also surpasses SeViLA.
In the audio category, Gemini Pro outperforms Whisper v2 in automatic speech translation and Whisper v3 in automatic speech recognition.

Gemini - Google DeepMind

Key takeaways:

Comments (0)

Newsletter