The lawsuit comes amid a growing debate over the use of data for training AI models. While companies argue that fair use protects their data scraping efforts, many copyright holders disagree. The case also highlights the increasing reliance on video transcriptions as a key source of training data. According to Originality.AI, more than 35% of the world's top 1,000 websites now block OpenAI's web crawler, and a study by MIT's Data Provenance Initiative found that around 25% of data from "high-quality" sources has been restricted from the major datasets used to train AI models.
Key takeaways:
- A YouTube creator, David Millette, is filing a class action lawsuit against OpenAI, alleging that the company used transcripts from YouTube videos to train its AI models without notifying or compensating the video owners.
- The complaint alleges that OpenAI violated copyright law and YouTube's terms of service, and profited significantly from the creators' work.
- OpenAI allegedly used the transcriptions to train its AI-powered chatbot platform, ChatGPT, and other generative AI tools and products.
- Millette is seeking a jury trial and over $5 million in damages for all YouTube users and creators whose data might have been used by OpenAI for training its AI models.