To support developers, Google is releasing the Multimodal Live API, enabling real-time audio and video streaming app development. This API allows for natural conversation patterns and tool integration, similar to OpenAI’s Realtime API. Google is also implementing SynthID technology to watermark all generated content, addressing concerns about deepfake misuse. The Multimodal Live API is available now, with the full production version of 2.0 Flash expected in January.
Key takeaways:
```html
- Google has announced Gemini 2.0 Flash, an AI model capable of generating images and audio in addition to text, with initial access limited to early partners.
- The new model can interact with third-party apps and services, including Google Search, and is twice as fast as the previous Gemini 1.5 Pro model.
- 2.0 Flash features customizable audio generation and can analyze and modify images, with outputs watermarked using Google's SynthID technology.
- Google has released the Multimodal Live API to enable developers to build real-time, multimodal apps with audio and video streaming capabilities.