Google brains plumb depths of the uncanny valley with latest image-to-video tool

Google has developed a new AI tool, VLOGGER, that can animate a still photo using a recording of a person's speech. The tool doesn't require any per-person training or face detection and can generate dynamic videos with control over identity and pose. Despite the videos appearing unrealistic, the researchers claim that VLOGGER outperforms previous image quality, identity preservation, and temporal consistency measurements across three public benchmarks.

However, the tool has been criticized for the quality of its synthetic videos. One potential application of VLOGGER could be lip syncing to translate existing videos from one language to another. However, the current results are not ready for real-world application. It remains unclear whether Google plans to release VLOGGER or incorporate the technology into its other AI products.

Key takeaways:

Google has developed a new AI tool called VLOGGER that can animate a still photo using a recording of a person's speech, without any per-person training or face detection.
The researchers claim that VLOGGER outperforms previous state-of-the-art image quality, identity preservation, and temporal consistency measurements across three public benchmarks.
VLOGGER uses a two-step process to generate videos from still photos, predicting body motion and facial expressions from input audio, and using an architecture model based on recent image diffusion models to provide control in the temporal and spatial domains.
Despite the researchers' claims, many have criticized the videos produced by VLOGGER as looking fake and falling short of Google's usual standards, and it's unclear whether Google plans to release VLOGGER or add the tech to its other AI products.

Google brains plumb depths of the uncanny valley with latest image-to-video tool

Key takeaways:

Comments (0)

Newsletter