In response to AI scraping, newsrooms are updating their terms of service to ban AI scraping, blocking AI data scraping bots, licensing their content to AI companies for training data, and creating their own Language Models (LLMs). Some organizations have also filed lawsuits against OpenAI and Google for illegally harvesting data. As more publishers put up barriers to web scraping, AI companies are exploring alternatives like synthetic data. The article suggests that collaboration, through proactive licensing of content as training data, could be a sustainable way for AI and journalism to coexist.
Key takeaways:
- News organizations are fighting back against AI companies scraping their content without permission, with some calling it the “largest theft in the United States.”
- Newsrooms are taking steps to protect their content, including updating terms of service to ban AI scraping, blocking AI data scraping bots, licensing training content to AI companies, and creating their own LLMs.
- Some news organizations have filed lawsuits against AI companies like OpenAI and Google, accusing them of illegally harvesting “massive amounts of personal data” to train their AI chatbots.
- As more publishers put up barriers to web scraping, AI companies are exploring alternative paths, such as using synthetic data or collaborating with news organizations to license access to their content as training data.