Cool or creepy? Microsoft's VASA-1 is a new AI model that turns photos into 'talking faces'

Microsoft's new AI research paper introduces VASA-1, a model that can create a hyper-realistic talking face video from a single photo and an audio file. The model, currently only available for the Microsoft Research team, offers impressive lip sync, realistic facial features, and head movement, surpassing similar technologies from Runway and Nvidia. It can animate synthetic images and real photos, even those not facing forward, and can take eye gaze direction, head distance, and emotion as inputs.

VASA-1 could be used for advanced lip synching in games, creating virtual avatars for social media videos, and AI-based movie making. Despite its potential, the team has no plans for a public release or making it available for developers. The model can perfectly lip-sync to a song and handle different image styles, and it can create 512x512 pixel images at 45 frames per second in about 2 minutes using a desktop-grade Nvidia RTX 4090 GPU.

Key takeaways:

Microsoft's new AI research paper introduces VASA-1, a model that can create a hyper-realistic talking face video from a single portrait photo and an audio file.
The technology is currently only available for the Microsoft Research team, but the demo videos show impressive lip sync, realistic facial features, and head movement.
One of the potential applications of VASA-1 is in advanced lip synching for games, creating AI-driven NPCs with natural lip movement, and creating virtual avatars for social media videos.
Despite its potential, the team has stated that this is just a research demonstration with no plans for a public release or making it available to developers for use in products.

Cool or creepy? Microsoft's VASA-1 is a new AI model that turns photos into 'talking faces'

Key takeaways:

Comments (0)

Newsletter