Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Meta Allegedly Trained its AI with Copyrighted Books, Despite Warnings

Dec 13, 2023 - techtimes.com
Meta, the parent company of Facebook, reportedly used thousands of copyrighted books to train its AI Llama without the authors' permission, according to a recent copyright infringement lawsuit. The company was allegedly aware of the potential legal implications of using copyrighted material for AI training, which may not be protected under U.S. copyright law. The evidence includes chat logs from a Meta researcher discussing the legality of using the book files as training data with the company's legal department.

The lawsuit also refers to a dataset called 'The Pile,' which Meta used to train its AI model. 'The Pile' is a collection of AI training content, including the 'Books3 database' with approximately 196,000 books in plain-text format. The data was collected from thousands of novels and nonfiction books published in the last 20 years. The lawsuit is part of a broader trend of content producers suing tech companies for using their copyrighted works to develop AI models without permission.

Key takeaways:

  • Meta's AI Llama has reportedly been trained using thousands of copyrighted books without the authors' permission, which may not be protected by U.S. copyright law.
  • Chat logs from a researcher connected to Meta suggest that the company was aware of the legal implications of using copyrighted books for AI training.
  • 'The Pile', a dataset used by Meta for AI training, includes the 'Books3 database' containing roughly 196,000 books and is reportedly not usable for legal reasons.
  • Many lawsuits have been filed this year against tech companies for using copyrighted works to develop AI models, which could potentially increase the cost of developing AI models and force companies to pay for the use of copyrighted works.
View Full Article

Comments (0)

Be the first to comment!