The introduction of Google-Extended comes after Google's announcement in July that it is training its AI chatbot, Bard, on publicly available data from the web. This move follows the trend of many sites blocking the web crawler used by OpenAI to scrape data and train ChatGPT, including prominent publishers like The New York Times, CNN, Reuters, and Medium.
Key takeaways:
- Google has introduced a new tool called Google-Extended that allows website publishers to opt out of having their data used to train Google's AI models.
- The tool still allows sites to be scraped and indexed by Googlebot, but prevents the data from being used to train current and future AI models.
- Google-Extended lets publishers manage whether their sites help improve Bard and Vertex AI generative APIs and control access to content on their site.
- The tool is accessible through robots.txt, a text file that informs web crawlers whether they can access certain sites.