DeepMind's new AI generates soundtracks and dialogue for videos

Google's AI research lab, DeepMind, is developing an AI technology called V2A (video-to-audio) to generate soundtracks for videos. The technology uses a diffusion model trained on sounds, dialogue transcripts, and video clips to create music, sound effects, and dialogue that matches the video content. However, the technology is not perfect and struggles with videos containing artifacts or distortions, and the generated audio is not highly convincing.

Despite these limitations, DeepMind sees potential in V2A, particularly for archivists and those working with historical footage. However, the company has no immediate plans to release the technology to the public due to potential misuse. Before considering public access, DeepMind plans to conduct rigorous safety assessments and testing, and is currently seeking feedback from creators and filmmakers to inform ongoing research and development.

Key takeaways:

DeepMind, Google’s AI research lab, is developing AI technology, V2A, to generate soundtracks for videos.
The V2A tech can create music, sound effects, and dialogue that matches the characters and tone of the video, using DeepMind’s SynthID technology.
DeepMind's V2A technology is unique as it can understand the raw pixels from a video and sync generated sounds with the video automatically.
DeepMind will not release the tech to the public anytime soon due to its imperfections and to prevent misuse. It will undergo rigorous safety assessments and testing before considering wider public access.

DeepMind's new AI generates soundtracks and dialogue for videos | TechCrunch

Key takeaways:

Comments (0)

Newsletter