The researchers used a type of machine learning model called diffusion models and trained it on a new dataset called MENTOR, which contains over 800,000 diverse identities and 2,200 hours of video. Potential applications of the technology include automatic dubbing of videos into other languages, editing and filling in missing frames in a video, and creating full videos of a person from a single photo. However, the technology could also be misused to create deepfakes, exacerbating challenges around misinformation and digital fakery.
Key takeaways:
- Google researchers have developed an AI system, VLOGGER, that can generate lifelike videos of people speaking and gesturing from a single still photo.
- The technology uses advanced machine learning models and a large dataset called MENTOR, which contains over 800,000 diverse identities and 2,200 hours of video.
- Potential applications of VLOGGER include dubbing videos into other languages, editing and filling in missing frames in a video, creating photorealistic avatars for VR and gaming, and creating more engaging AI-powered virtual assistants and chatbots.
- Despite its potential, the technology also raises concerns about misuse, particularly in creating deepfakes, which could exacerbate challenges around misinformation and digital fakery.