Creepy Crawlers: Most Top Sites Are Blocking Them, Especially OpenAI

A study by the Reuters Institute reveals that U.S. publishers are leading in blocking OpenAI crawlers, with 79% of top U.S. news sites doing so in 2023, compared to an average of 48% across 10 countries studied. The study also found that 40% of U.S. sites blocked Google AI crawlers, with an average of 24% across all countries. Once a site decided to block an AI crawler, it did not unblock it, and those that blocked Google AI crawlers also blocked OpenAI.

The study categorized web outlets into print publications, television, and digital-born outlets, with print outlets blocking the most AI crawlers. Legacy print outlets and outlets with larger reach were more likely to block. The report notes that outlets like the New York Times believe they should be compensated for the use of their content to train AI models, while others fear incorrect outputs. However, some firms, like Axel Springer, have made deals with companies like OpenAI to use their news content.

Key takeaways

U.S. publishers lead the world in blocking OpenAI crawlers, with 79% of the top U.S. news sites doing so in 2023. The average across the 10 countries studied was 48%.
40% of U.S. sites were blocked by Google AI crawlers and 24% across all 10 countries. None of the sites unblocked an OpenAI or Google AI crawler once they had decided to block.
Print publications, both newspapers and magazines, blocked AI crawlers the most, followed by television and digital-born outlets. Legacy print outlets and outlets with a larger reach were more likely to block.
Media outlets such as the New York Times believe they should be financially compensated for the use of their content to train AI models. However, some firms, like Axel Springer, have already struck deals with companies such as OpenAI.

Creepy Crawlers: Most Top Sites Are Blocking Them, Especially OpenAI

Key takeaways

Discussion (0)