Gemini 2.0, Google's newest flagship AI, can generate text, images, and speech

Google has introduced Gemini 2.0 Flash, a major AI model upgrade that can generate images and audio in addition to text, and interact with third-party apps and services. Initially available to early access partners, its full capabilities will be rolled out in January. The model is twice as fast as its predecessor, Gemini 1.5 Pro, and excels in coding and image analysis. It can also modify images, ingest multimedia content, and generate audio with customizable features. Google plans to integrate 2.0 Flash into various products like Android Studio and Chrome DevTools.

To support developers, Google is releasing the Multimodal Live API, enabling real-time audio and video streaming app development. This API allows for natural conversation patterns and tool integration, similar to OpenAI’s Realtime API. Google is also implementing SynthID technology to watermark all generated content, addressing concerns about deepfake misuse. The Multimodal Live API is available now, with the full production version of 2.0 Flash expected in January.

Key takeaways:

Google has announced Gemini 2.0 Flash, an AI model capable of generating images and audio in addition to text, with initial access limited to early partners.
The new model can interact with third-party apps and services, including Google Search, and is twice as fast as the previous Gemini 1.5 Pro model.
2.0 Flash features customizable audio generation and can analyze and modify images, with outputs watermarked using Google's SynthID technology.
Google has released the Multimodal Live API to enable developers to build real-time, multimodal apps with audio and video streaming capabilities.

Gemini 2.0, Google's newest flagship AI, can generate text, images, and speech | TechCrunch

Key takeaways:

Comments (0)

Newsletter