Google researchers unveil ‘VLOGGER’, an AI that can bring still photos to life

Google researchers have developed an AI system called VLOGGER that can generate lifelike videos of people speaking and gesturing from a single still photo. The technology uses advanced machine learning models to create realistic footage, which could have various applications but also raises concerns about deepfakes and misinformation. The AI model takes a photo and an audio clip as input and outputs a video that matches the audio, showing the person speaking the words and making corresponding facial expressions and movements.

The researchers used a type of machine learning model called diffusion models and trained it on a new dataset called MENTOR, which contains over 800,000 diverse identities and 2,200 hours of video. Potential applications of the technology include automatic dubbing of videos into other languages, editing and filling in missing frames in a video, and creating full videos of a person from a single photo. However, the technology could also be misused to create deepfakes, exacerbating challenges around misinformation and digital fakery.

Key takeaways:

Google researchers have developed an AI system, VLOGGER, that can generate lifelike videos of people speaking and gesturing from a single still photo.
The technology uses advanced machine learning models and a large dataset called MENTOR, which contains over 800,000 diverse identities and 2,200 hours of video.
Potential applications of VLOGGER include dubbing videos into other languages, editing and filling in missing frames in a video, creating photorealistic avatars for VR and gaming, and creating more engaging AI-powered virtual assistants and chatbots.
Despite its potential, the technology also raises concerns about misuse, particularly in creating deepfakes, which could exacerbate challenges around misinformation and digital fakery.

Google researchers unveil ‘VLOGGER’, an AI that can bring still photos to life

Key takeaways:

Comments (0)

Newsletter