Google's New Gemini AI Will Understand Your Photos and Videos, not Just Text

Google has introduced a new AI model, Gemini, to its Bard AI chatbot, bringing a native understanding of video, audio, and photos. The new technology, currently available only in English, improves the AI's abilities in complex tasks such as summarizing documents, reasoning, and writing programming code. Gemini comes in three versions tailored for different levels of computing power: Gemini Nano for mobile phones, Gemini Pro for Google's data centers, and Gemini Ultra, which is currently limited to a test group.

Gemini represents a significant advancement in the generative AI field, where chatbots create their own responses to prompts written in plain language. Despite these advancements, AI models still face fundamental problems, such as providing plausible instead of correct answers. Gemini is the next generation of Google's large language model, trained simultaneously on text, programming code, images, audio, and video, allowing it to more efficiently handle multimedia input.

Key takeaways:

Google has introduced a new AI model called Gemini, which brings a native understanding of video, audio and photos to its Bard AI chatbot.
Gemini comes in three versions tailored for different levels of computing power: Gemini Nano for mobile phones, Gemini Pro for fast responses in data centers, and Gemini Ultra for a new Bard Advanced chatbot due in 2024.
The new model represents a significant advancement in the generative AI field, where chatbots create their own responses to prompts written in plain language.
Despite these advancements, AI models still face fundamental problems, such as providing plausible but not necessarily correct answers, and Google advises users to double-check the responses of its chatbot.

Google's New Gemini AI Will Understand Your Photos and Videos, not Just Text

Key takeaways:

Comments (0)

Newsletter