Microsoft is exploring a way to credit contributors to AI training data

Microsoft is initiating a research project to assess the influence of specific training data on the outputs of generative AI models, as detailed in a job listing from December. The project aims to demonstrate that models can be trained to efficiently estimate the impact of particular data sources, such as photos and books, on their outputs. This effort, involving technologist Jaron Lanier, aligns with the concept of "data dignity," which seeks to acknowledge and potentially compensate contributors of valuable data. The initiative comes amid ongoing IP lawsuits against AI companies, including Microsoft, over the use of copyrighted material in training AI models.

The research, described as "training-time provenance," could be a response to legal and regulatory challenges facing AI companies. While some companies like Bria, Adobe, and Shutterstock have begun compensating data contributors, large labs have mostly offered opt-out options for copyright holders. Microsoft's project may serve as a proof of concept, similar to OpenAI's yet-to-be-released tool for creator control over training data. The move is notable as other AI labs, including Google and OpenAI, advocate for weaker copyright protections to facilitate AI development, urging the U.S. government to codify fair use for model training.

Key takeaways:

Microsoft is launching a research project to estimate the influence of specific training examples on generative AI models' outputs.
The project aims to address the opacity of current neural network architectures and explore the concept of "data dignity" by tracing influential contributors to AI-generated content.
Several companies, including Bria, Adobe, and Shutterstock, are already attempting to compensate data owners based on their influence, but large labs have mostly focused on opt-out processes for copyright holders.
Microsoft's initiative may be a response to ongoing IP lawsuits and regulatory pressures, as well as a potential move to influence future copyright policies related to AI development.

Microsoft is exploring a way to credit contributors to AI training data | TechCrunch

Key takeaways:

Comments (0)

Newsletter