The lack of transparency has sparked controversy and criticism, with OpenAI facing multiple copyright lawsuits for its data-scraping practices. The incident has raised questions about the ethics of AI companies using publicly available data, and the need for clearer communication about where AI training data is sourced from.
Key takeaways:
- OpenAI's CTO, Mira Murati, was unable to provide clear answers regarding the sources of the training data used for the company's new video-generating AI, Sora, during an interview with The Wall Street Journal.
- Murati confirmed that the data used was either publicly available or licensed, but could not specify whether it included videos from platforms like YouTube, Instagram, or Facebook.
- OpenAI has faced controversy and lawsuits over its data-scraping practices, raising concerns about the company's transparency and respect for copyright laws.
- After the interview, Murati reportedly confirmed that videos from Shutterstock, a stock image company partnered with OpenAI, were included in Sora's training set.