For less demanding applications, Google is launching Gemini 1.5 Flash, a smaller and more efficient model designed for high-frequency generative AI workloads. This model is suitable for tasks like summarization, chat apps, and data extraction from long documents. Google also plans to introduce a feature called context caching for all Gemini models, allowing developers to store large amounts of information that the models can access quickly and cheaply. Other features, such as the Batch API and controlled generation, aim to provide more cost-effective ways to handle workloads and define model outputs.
Key takeaways:
- Google has announced a new version of Gemini 1.5 Pro, its flagship AI model, which can now analyze up to 2 million tokens, double the previous maximum, making it the largest input of any commercially available model.
- Google is also launching Gemini 1.5 Flash, a smaller and more efficient version of Gemini 1.5 Pro, designed for less demanding, high-frequency generative AI workloads.
- All Gemini models will soon be able to use a feature called context caching, which allows developers to store large amounts of information for quick and cost-effective access.
- Google is introducing a new feature called controlled generation, which allows users to define Gemini model outputs according to specific formats or schemas, potentially leading to further cost savings.