To Understand Big Tech’s AIs, Follow Their Data Trails

The article discusses the importance of data in differentiating generative artificial intelligence (AI) products. As AI firms strive to stand out, they are realizing that product-level differentiation comes from the differences in training data their AI models are built from. However, as generative AI platforms expand their capabilities, they require more and varied data, which is becoming increasingly difficult and expensive to access. The article also highlights the controversy surrounding the use of 'publicly available' web data, with AI firms facing legal actions for allegedly using stolen data to train their models.

The article further discusses how companies like Meta Platforms are using their own user data to train AI models. However, for firms without native access to such data, a competition is emerging over licensing access to safe and well-structured data sets. As the industry matures, firms are rushing to secure licensing agreements for the datasets they need. Some data holders, like Getty Images, are deciding to build their own AI models using their valuable libraries, potentially making data the scarcest and most valuable input for training AI systems.

Key takeaways

AI firms are increasingly realizing that differentiation is crucial for survival, and this can only come from differences in training the data their AI models are built from.
Meta Platforms has introduced a suite of new AI products, including AI-powered chatbots with distinct personality profiles, built using public posts on Facebook and Instagram.
Generative AI technologies are trained using large quantities of data, and most AI firms use similar, publicly available web data to build out their foundational models.
As awareness of data privacy grows, AI firms are facing legal actions over the use of data to train their AI systems, and some data holders are deciding to build their own AI models rather than license out their valuable libraries.

To Understand Big Tech’s AIs, Follow Their Data Trails

Key takeaways

Discussion (0)