Mark Zuckerberg Gave Meta's Llama Team the OK To Train On Copyright Works, Filing Claims

In the case of Kadrey v. Meta, plaintiffs, including authors Sarah Silverman and Ta-Nehisi Coates, allege that Meta CEO Mark Zuckerberg authorized the use of a dataset of pirated ebooks and articles, known as LibGen, to train the company's Llama AI models. The plaintiffs claim that Meta concealed its actions by removing copyright information and torrenting the data. According to newly unredacted documents filed with the U.S. District Court for the Northern District of California, Meta employees expressed concerns about using LibGen, describing it as a "data set we know to be pirated," and warned that its use could weaken Meta's negotiating position with regulators. Despite these concerns, Zuckerberg allegedly approved the use of LibGen for training purposes.

The filing aligns with previous reports suggesting that Meta cut corners in data gathering for its AI projects, including hiring contractors in Africa to summarize books and considering purchasing the publisher Simon & Schuster. Meta's executives reportedly believed that negotiating licenses would take too long and considered fair use a viable defense. The new accusations in the filing suggest that Meta may have attempted to hide its alleged copyright infringement by stripping attribution from the LibGen data.

Key takeaways

Meta is accused of using a dataset of pirated ebooks and articles, known as LibGen, for training its Llama AI models.
Mark Zuckerberg allegedly approved the use of the LibGen dataset despite internal concerns about its legality.
Meta employees reportedly acknowledged that LibGen was a pirated dataset and worried about its impact on regulatory negotiations.
The plaintiffs claim Meta attempted to hide its actions by removing copyright information from the LibGen data.

Mark Zuckerberg Gave Meta's Llama Team the OK To Train On Copyright Works, Filing Claims - Slashdot

Key takeaways

Discussion (0)