The author has created a searchable database to help authors find out if their works are included in the Books3 data set. However, the author warns of several caveats, such as multiple appearances of the same book due to different editions or translations, inconsistencies in the spelling of author names, and the possibility of false positives due to errors in the book-identification process.
Key takeaways:
- A data set of more than 191,000 books, known as "Books3," was used without permission to train generative-AI systems by Meta, Bloomberg, and others. This has led to several lawsuits against Meta by authors claiming copyright infringement.
- Most authors were unaware that their works were being used in this way, and the people building and training these AI systems stand to profit significantly.
- Meta's spokesperson did not directly address the use of pirated books in training their generative-AI product, LLaMA, instead referring to a court filing arguing that the case should be dismissed as the AI model and its outputs are not "substantially similar" to the authors' books.
- The author of the article has created a searchable database for authors to check if their works are included in the Books3 data set. However, due to inconsistencies in author name spelling and possible errors in book identification, there may be some inaccuracies.