Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Inside Meta’s race to beat OpenAI: “We need to learn how to build frontier and win this race”

Jan 14, 2025 - theverge.com
A major copyright lawsuit against Meta has unveiled internal communications suggesting the company used copyrighted data, including content from the book piracy site Library Genesis (LibGen), to train its AI models, Llama. The documents indicate Meta was aware of the potential legal issues and sought to conceal its use of pirated data while racing to compete with rivals like OpenAI and Mistral. Meta executives discussed the necessity of LibGen for achieving state-of-the-art performance and considered various mitigations, such as removing clearly marked pirated content and avoiding external citations of the data source.

The lawsuit, filed by author Richard Kadrey, comedian Sarah Silverman, and others, accuses Meta of violating intellectual property laws by using illegally obtained content. Meta has argued that using copyrighted material for training data should be considered fair use. The case highlights the broader issue of data scarcity in AI development, with companies like Meta and OpenAI exploring unconventional methods to acquire unique data. Despite a partial dismissal of the lawsuit, the evidence could bolster the plaintiffs' case as it progresses in court.

Key takeaways:

  • Meta is facing a major copyright lawsuit for allegedly using pirated data to train its AI models, Llama, and attempting to conceal it.
  • Internal communications suggest Meta considered using the book piracy site Library Genesis (LibGen) to achieve state-of-the-art performance in AI models.
  • Meta's internal documents reveal efforts to obscure copyright information in training data to avoid legal complications.
  • The lawsuit evidence could strengthen the case against Meta as it progresses in court, despite a partial dismissal last year.
View Full Article

Comments (0)

Be the first to comment!