This revelation follows a lawsuit filed last month by three writers alleging that their copyrighted works were used as part of training Meta’s LLaMA. OpenAI, the creator of AI chatbot ChatGPT, has also been accused of training its model on copyrighted works. The independent AI developer who created Books3, Shawn Presser, expressed sympathy for authors' concerns but defended the creation of the database for the development of generative AI tools. While Meta declined to comment, a Bloomberg spokesperson confirmed their use of Books3 but stated they will not use it for future versions of BloombergGPT.
Key takeaways:
- Thousands of authors, including Zadie Smith and Stephen King, have had their pirated works used to train AI tools, with over 170,000 titles used by companies such as Meta and Bloomberg.
- The dataset, known as Books3, was used to train AI models like Meta's LLaMA and Bloomberg's BloombergGPT. It contains a mix of fiction and non-fiction, with the majority of books published in the last two decades.
- OpenAI, the company behind AI chatbot ChatGPT, has also been accused of training its model on copyrighted works. A lawsuit alleges that the company's training data comes from "shadow libraries" that offer pirated books.
- Despite the controversy, the creator of Books3, Shawn Presser, defends the dataset, arguing it allows anyone to develop generative AI tools and prevents large companies from monopolizing the technology.