Although OpenAI was able to recover most of the data, the file names and folder structure were lost, making it impossible to determine where the publishers' articles were used in the AI models. The publishers' lawyers stated that they have had to start their work from scratch, using significant time and computer processing resources.
Key takeaways:
- The New York Times and Daily News are suing OpenAI for allegedly using their content without permission to train its AI models.
- OpenAI had agreed to provide two virtual machines for the publishers to search for their copyrighted content in its AI training sets.
- OpenAI engineers accidentally deleted all the publishers' search data stored on one of the virtual machines, making it impossible to determine where the publishers' articles were used in OpenAI's models.
- The publishers have had to recreate their work from scratch, using significant person-hours and computer processing time.