Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI

Nvidia has been found to use videos scraped from YouTube and other sources to compile training data for its AI products, according to internal Slack chats, emails, and documents obtained by 404 Media. Despite questions raised by employees about the legal and ethical implications of using copyrighted content for AI model training, Nvidia defended its actions as being fully compliant with copyright law. The company's management reportedly assured employees they had clearance to use such content from the highest levels of the company.

The data was used to train an AI model for Nvidia’s Omniverse 3D world generator, self-driving car systems, and “digital human” products as part of a project internally named Cosmos. A former Nvidia employee, granted anonymity by 404 Media, revealed that employees were asked to scrape videos from Netflix, YouTube, and other sources for this purpose. The project has not yet been released to the public.

Key takeaways

Nvidia has been scraping videos from Youtube and other sources to compile training data for its AI products, according to internal documents obtained by 404 Media.
Nvidia claims its practice is in full compliance with copyright law, despite concerns raised by employees about potential legal issues.
Employees were asked to scrape videos from Netflix, YouTube, and other sources to train an AI model for Nvidia’s Omniverse 3D world generator, self-driving car systems, and “digital human” products.
The project, internally named Cosmos, has not yet been released to the public.

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI

Key takeaways

Discussion (0)