The scarcity of internet data is prompting AI companies to seek new strategies, including partnerships with academic publishers for access to structured data. This is exemplified by Microsoft's $10 million deal with Taylor & Francis for scholarly articles. The data wall primarily affects unstructured training data, but there is still potential in creating structured data for AI training, such as complex math and science problems. As AI-generated content becomes more prevalent and publishers block scraping bots, companies are finding new ways to monetize overlooked datasets, sparking fresh ideas and business models.
Key takeaways:
```html
- Leading AI researchers, including former OpenAI chief scientist Ilya Sutskever, warn that the availability of high-quality internet data for training AI systems is reaching its limits, prompting a need for innovative data sources and approaches.
- Companies are increasingly turning to unique data sources, such as healthcare records and IoT devices, as traditional internet data becomes scarce, highlighting the value of specialized datasets.
- The scarcity of internet data is driving the development of AI models that rely more on advanced reasoning capabilities and less on raw data, with synthetic data and crowd-sourced insights emerging as alternative solutions.
- Deals with academic publishers, like Microsoft's $10 million agreement with Taylor & Francis, are providing AI companies access to vast research archives, offering a potential solution to the data drought.