Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Gemini - Google DeepMind

Dec 06, 2023 - deepmind.google
The markdown data presents a comparison of various AI models' performance on different tasks. In the field of multi-discipline college-level reasoning problems, Gemini Ultra (pixel only) achieved a 59.4% 0-shot pass@1, while GPT-4V scored 56.8%. In natural image understanding, Gemini Ultra (pixel only) scored 77.8% 0-shot, slightly outperforming GPT-4V's 77.2%. Gemini Ultra (pixel only) also led in OCR on natural images, document understanding, and infographic understanding, with scores of 82.3%, 90.9%, and 80.3% respectively. In mathematical reasoning in visual contexts, Gemini Ultra (pixel only) scored 53% 0-shot, slightly higher than GPT-4V's 49.9%.

In video-related tasks, Gemini Ultra scored 62.74-shot in English video captioning, outperforming DeepMind Flamingo's 564-shot. For video question answering, Gemini Ultra scored 54.7% 0-shot, higher than SeViLA's 46.3%. In the audio domain, Gemini Pro scored 40.1 in automatic speech translation, significantly higher than Whisper v2's 29.1. For automatic speech recognition, Gemini Pro achieved a 7.6% word error rate, which is better than Whisper v3's 17.6%.

Key takeaways:

  • Gemini Ultra (pixel only) outperforms GPT-4V in multi-discipline college-level reasoning problems, natural image understanding, OCR on natural images, document understanding, and infographic understanding.
  • In mathematical reasoning in visual contexts, Gemini Ultra (pixel only) also performs better than GPT-4V.
  • For English video captioning, Gemini Ultra outperforms DeepMind Flamingo. In video question answering, Gemini Ultra also surpasses SeViLA.
  • In the audio category, Gemini Pro outperforms Whisper v2 in automatic speech translation and Whisper v3 in automatic speech recognition.
View Full Article

Comments (0)

Be the first to comment!