Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

OpenAI’s Sora: The devil is in the ‘details of the data’

Mar 14, 2024 - venturebeat.com
OpenAI CTO Mira Murati faced tough questions in a recent Wall Street Journal interview about the data used to train the company's Sora text-to-video model. Murati confirmed that OpenAI used Shutterstock content, but was vague about whether data from YouTube, Facebook, or Instagram was used. This comes amid copyright-related lawsuits against OpenAI, including one from the New York Times. The issue of training data is not just a legal matter, but also one of trust and transparency, with stakeholders wanting to know whether the data used was publicly available and properly licensed.

The article also discusses the broader implications of using publicly available data to train AI models. Companies like Google and Meta are known to use publicly shared YouTube, Facebook, and Instagram content to train their models, which is legal but raises questions about public awareness and consent. The author suggests that the public may not be comfortable with their social media content being used to train commercial models that generate significant profits for tech companies. The issue of training data is seen as a foundational one for generative AI, with potential repercussions not just in the courts, but also in the court of public opinion.

Key takeaways:

  • OpenAI CTO Mira Murati faced tough questions during a Wall Street Journal interview about the data used to train the company's Sora text-to-video model, specifically whether it included content from YouTube, Facebook, or Instagram.
  • Murati confirmed that the model was trained on publicly available and licensed data, but did not provide further details, leading to criticism and concerns about transparency.
  • The issue of training data is not just a concern for OpenAI, but also for other tech giants like Google and Meta, which have confirmed using publicly shared content from their platforms for training their models.
  • The article suggests that the use of publicly available data for training AI models could face a reckoning in the court of public opinion, as people become more aware of how their content is being used by these companies.
View Full Article

Comments (0)

Be the first to comment!