Why publishers are questioning the effectiveness of blocking AI web crawlers

Several publishers, including Bloomberg and The New York Times, have blocked OpenAI's web crawler from accessing their sites to protect their content from being used to feed the AI company's large language models (LLMs). However, the effectiveness of this strategy is questionable, as publishers' content distribution models may render the protective strategy useless. OpenAI, Google, and Microsoft are among the tech companies using web crawlers to feed their LLMs for AI tools and systems, and these companies are exploring alternatives to give publishers more control over how their intellectual property is used.

Publishers are using the blocking of OpenAI's web crawler as a negotiation tactic, creating a "market for licenses for data mining," with potential compensation for sharing their data. OpenAI has already struck a licensing partnership with the Associated Press, where it pays to license part of the AP's text archive to train its models. However, smaller publishers feel they lack the power to negotiate the use of their content with these large tech companies.

Key takeaways

A number of publishers, including Bloomberg and The New York Times, have blocked OpenAI’s web crawler from accessing their sites to protect their content from being used to feed the AI company’s large language models.
Google and Microsoft are exploring alternatives to give publishers more control over how their content is used. Google has released a tool that allows website owners to opt out of having their sites crawled for data used to train Google’s AI systems, while Microsoft allows publishers to add a code to their web pages to communicate that the content should not be used for LLMs.
The New York Times has added language to its Terms of Service prohibiting the use of its content to train machine learning or AI systems, giving the Times the ability to pursue legal action against companies using their data.
Publishers are blocking OpenAI’s web crawler as a negotiation tactic, creating a “market for licenses for data mining,” with a potential for compensation for sharing their data.

Why publishers are questioning the effectiveness of blocking AI web crawlers

Key takeaways

Discussion (0)