Veo 3 can generate videos — and soundtracks to go along with them

Google unveiled its latest video-generating AI model, Veo 3, at the Google I/O 2025 developer conference. Veo 3 can generate audio, including sound effects, background noises, and dialogue, to accompany the videos it creates, marking a significant improvement over its predecessor, Veo 2. Available through Google’s Gemini chatbot app for subscribers to the $249.99-per-month AI Ultra plan, Veo 3 can be prompted with text or images to create synchronized audio and video content. Google claims that Veo 3's ability to understand raw pixels and automatically sync sounds with clips sets it apart from other models in the increasingly saturated video generator market. The model's development was likely influenced by DeepMind's previous work in "video-to-audio" AI, and while the exact training data sources are undisclosed, YouTube is a probable contributor.

To address concerns about deepfakes, Google is employing its SynthID watermarking technology to embed invisible markers in Veo 3-generated frames. Despite the creative potential of tools like Veo 3, there is apprehension about their impact on industries, with a study predicting significant job disruptions in the U.S. film, television, and animation sectors by 2026. Additionally, Google introduced new features for Veo 2, enhancing its ability to understand camera movements and allowing users to modify video content more effectively. These updates will soon be available on Google’s Vertex AI API platform.

Key takeaways

Google's Veo 3 AI model can generate audio, including sound effects and dialogue, to accompany the videos it creates, marking a significant advancement in video generation technology.
Veo 3 is available in Google's Gemini chatbot app for subscribers to the AI Ultra plan, priced at $249.99 per month, and can be prompted with text or an image.
DeepMind's previous work in "video-to-audio" AI likely contributed to Veo 3's development, and the model may have been trained on YouTube material.
Google is using SynthID watermarking technology to embed invisible markers in Veo 3-generated frames to mitigate deepfake risks, while also rolling out new features for Veo 2.

Veo 3 can generate videos — and soundtracks to go along with them | TechCrunch

Key takeaways

Discussion (0)