The study categorized web outlets into print publications, television, and digital-born outlets, with print outlets blocking the most AI crawlers. Legacy print outlets and outlets with larger reach were more likely to block. The report notes that outlets like the New York Times believe they should be compensated for the use of their content to train AI models, while others fear incorrect outputs. However, some firms, like Axel Springer, have made deals with companies like OpenAI to use their news content.
Key takeaways:
- U.S. publishers lead the world in blocking OpenAI crawlers, with 79% of the top U.S. news sites doing so in 2023. The average across the 10 countries studied was 48%.
- 40% of U.S. sites were blocked by Google AI crawlers and 24% across all 10 countries. None of the sites unblocked an OpenAI or Google AI crawler once they had decided to block.
- Print publications, both newspapers and magazines, blocked AI crawlers the most, followed by television and digital-born outlets. Legacy print outlets and outlets with a larger reach were more likely to block.
- Media outlets such as the New York Times believe they should be financially compensated for the use of their content to train AI models. However, some firms, like Axel Springer, have already struck deals with companies such as OpenAI.