Sign up to save tools and stay up to date with the latest in AI

How many news websites block AI crawlers?

Feb 22, 2024 -
The article discusses the increasing trend of news websites blocking AI crawlers, such as those used by OpenAI and Google, to scrape data for training large language models (LLMs). By the end of 2023, 48% of the most widely used news websites across ten countries were blocking OpenAI’s crawlers, while 24% were blocking Google’s AI crawler. The proportion of news websites that blocked AI crawlers varied significantly by country, with the highest rates in the USA and the lowest in Mexico and Poland. The article suggests that this trend could impact the quality and relevance of AI outputs related to news.

The article also highlights differences in blocking behavior among different types of publishers. Legacy print outlets and outlets with a larger reach were more likely to block AI crawlers. The article concludes by noting that this is a rapidly evolving area, and the situation could change in the short term, especially as some publishers look to strike deals with AI companies and new products are being developed all the time.

Key takeaways:

  • By the end of 2023, 48% of the most widely used news websites across ten countries were blocking OpenAI’s crawlers, and 24% were blocking Google’s AI crawler.
  • Almost every website (97%) that decided to block Google’s AI crawler was also blocking OpenAI’s crawlers.
  • The proportion of news websites that blocked OpenAI varied considerably by country, ranging from 79% in the USA to just 20% in Mexico and Poland.
  • All types of news outlet were blocking, but the websites of legacy print publications were more likely to be blocking than those of either broadcasters or digital-born outlets.
View Full Article

Comments (0)

Be the first to comment!