In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From

OpenAI's CTO, Mira Murati, was unable to provide clear answers regarding the sources of training data for the company's new text-to-video AI, Sora, during an interview with The Wall Street Journal. While Murati stated that the data was publicly available or licensed, she was unable to confirm if videos from YouTube, Instagram, or Facebook were used, and later declined to go into detail about the data sources. After the interview, Murati confirmed that videos from Shutterstock, a stock image company partnered with OpenAI, were included in Sora's training set.

The lack of transparency has sparked controversy and criticism, with OpenAI facing multiple copyright lawsuits for its data-scraping practices. The incident has raised questions about the ethics of AI companies using publicly available data, and the need for clearer communication about where AI training data is sourced from.

Key takeaways

OpenAI's CTO, Mira Murati, was unable to provide clear answers regarding the sources of the training data used for the company's new video-generating AI, Sora, during an interview with The Wall Street Journal.
Murati confirmed that the data used was either publicly available or licensed, but could not specify whether it included videos from platforms like YouTube, Instagram, or Facebook.
OpenAI has faced controversy and lawsuits over its data-scraping practices, raising concerns about the company's transparency and respect for copyright laws.
After the interview, Murati reportedly confirmed that videos from Shutterstock, a stock image company partnered with OpenAI, were included in Sora's training set.

In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From

Key takeaways

Discussion (0)