OpenAI wants to work with organizations to build new AI training data sets

OpenAI has announced a new initiative called Data Partnerships, aimed at collaborating with third-party organizations to create public and private data sets for AI model training. The goal is to address the flaws and biases in current data sets, which are often U.S. and Western-centric. OpenAI plans to collect large-scale data sets that reflect human society and are not easily accessible online, including data that expresses human intention across different languages, topics, and formats.

However, the initiative has raised concerns about data ownership and compensation, especially in light of recent allegations that OpenAI has used work from creatives without their permission or payment. The company plans to create two types of data sets: an open-source data set for public use and a private data set for organizations that want to keep their data private but improve OpenAI's understanding of their domain.

Key takeaways

OpenAI has announced Data Partnerships, a program to collaborate with third-party organizations to build public and private data sets for AI model training, in an effort to combat the flaws and biases in existing data sets.
The company plans to collect "large-scale" data sets that "reflect human society" and that aren't easily accessible online, particularly seeking data that "expresses human intention" across different languages, topics and formats.
OpenAI will create two types of data sets: an open source data set that'd be public for anyone to use in AI model training and a set of private data sets for training proprietary AI models, intended for organizations that wish to keep their data private but want OpenAI’s models to have a better understanding of their domain.
Despite the stated intentions, the author suggests there may be a commercial motivation to improve the performance of OpenAI’s models at the expense of others, and raises concerns about the company's transparency and compensation to data owners.

OpenAI wants to work with organizations to build new AI training data sets | TechCrunch

Key takeaways

Discussion (0)