The New York Times, which is in a copyright dispute with OpenAI, has also blocked the startup's access to its content. Originality.ai CEO Jonathan Gillham suggests that if Google launches a generative AI search engine, sites that have blocked Google's access to training data may not appear in AI-generated results. Google is currently testing an early version of such a search engine, called Search Generative Experience (SGE), but it is unclear if or when it will be fully launched.
Key takeaways:
- Google has launched a new tool called Google-Extended that allows websites to block the company from using their content for training AI models.
- About 10% of the top 1,000 websites are using the Google-Extended snippet to block tech companies from using their content for AI model training, including The New York Times, CNN, BBC, Yelp, and Business Insider.
- Google-Extended is being used less than other AI training data-blockers, such as OpenAI's GPTBot and CCBot offered by Common Crawl.
- Google is testing an early version of a generative AI search engine through its Search Generative Experience (SGE), but it's unclear if the company will fully launch this in the future.