Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Cool or creepy? Microsoft's VASA-1 is a new AI model that turns photos into 'talking faces'

Apr 19, 2024 - tomsguide.com
Microsoft's new AI research paper introduces VASA-1, a model that can create a hyper-realistic talking face video from a single photo and an audio file. The model, currently only available for the Microsoft Research team, offers impressive lip sync, realistic facial features, and head movement, surpassing similar technologies from Runway and Nvidia. It can animate synthetic images and real photos, even those not facing forward, and can take eye gaze direction, head distance, and emotion as inputs.

VASA-1 could be used for advanced lip synching in games, creating virtual avatars for social media videos, and AI-based movie making. Despite its potential, the team has no plans for a public release or making it available for developers. The model can perfectly lip-sync to a song and handle different image styles, and it can create 512x512 pixel images at 45 frames per second in about 2 minutes using a desktop-grade Nvidia RTX 4090 GPU.

Key takeaways:

  • Microsoft's new AI research paper introduces VASA-1, a model that can create a hyper-realistic talking face video from a single portrait photo and an audio file.
  • The technology is currently only available for the Microsoft Research team, but the demo videos show impressive lip sync, realistic facial features, and head movement.
  • One of the potential applications of VASA-1 is in advanced lip synching for games, creating AI-driven NPCs with natural lip movement, and creating virtual avatars for social media videos.
  • Despite its potential, the team has stated that this is just a research demonstration with no plans for a public release or making it available to developers for use in products.
View Full Article

Comments (0)

Be the first to comment!