However, the initiative has raised concerns about data ownership and compensation, especially in light of recent allegations that OpenAI has used work from creatives without their permission or payment. The company plans to create two types of data sets: an open-source data set for public use and a private data set for organizations that want to keep their data private but improve OpenAI's understanding of their domain.
Key takeaways:
- OpenAI has announced Data Partnerships, a program to collaborate with third-party organizations to build public and private data sets for AI model training, in an effort to combat the flaws and biases in existing data sets.
- The company plans to collect "large-scale" data sets that "reflect human society" and that aren't easily accessible online, particularly seeking data that "expresses human intention" across different languages, topics and formats.
- OpenAI will create two types of data sets: an open source data set that'd be public for anyone to use in AI model training and a set of private data sets for training proprietary AI models, intended for organizations that wish to keep their data private but want OpenAI’s models to have a better understanding of their domain.
- Despite the stated intentions, the author suggests there may be a commercial motivation to improve the performance of OpenAI’s models at the expense of others, and raises concerns about the company's transparency and compensation to data owners.