Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Meta Admits Use of 'Pirated' Book Dataset to Train AI * TorrentFreak

Jan 11, 2024 - torrentfreak.com
Several rightsholders, including record labels, authors, and visual artists, have filed lawsuits against tech companies such as Meta and OpenAI for allegedly using their work without proper compensation to train AI models. The lawsuits center around the use of the Books3 dataset, created by AI researcher Shawn Presser, which contains a collection of more than 195,000 books scraped from the library of 'pirate' site Bibliotik. The dataset was used by many tech companies to improve their language models, leading to copyright infringement claims when the AI boom reached mainstream.

In response to these lawsuits, Meta has admitted to using portions of the Books3 dataset to train its Llama AI model, but denies allegations of copyright infringement. The tech giant argues that consent or compensation is not necessarily required for the use of copyrighted works to train AI, and that any unauthorized copies of copyrighted works constitute fair use. The fair use defense is expected to be a key part of these and other AI lawsuits, which are still in their early stages and could potentially reach the Supreme Court.

Key takeaways:

  • Several rightsholders, including record labels, authors, and the New York Times, have filed lawsuits against companies that develop AI models, alleging the use of their work without proper compensation.
  • The lawsuits often involve the use of the Books3 dataset, created by AI researcher Shawn Presser, which was scraped from the library of 'pirate' site Bibliotik and used to train AI models by tech companies like Meta and OpenAI.
  • Meta has admitted to using portions of the Books3 dataset to train its Llama AI model, but denies allegations of copyright infringement, suggesting that its use of copyrighted works did not require consent, credit, or compensation.
  • Meta plans to rely on a fair use defense, arguing that any unauthorized copies of copyrighted works constitute fair use under U.S. law. This fair use angle is expected to be a key part of this and other AI lawsuits.
View Full Article

Comments (0)

Be the first to comment!