The research, described as "training-time provenance," could be a response to legal and regulatory challenges facing AI companies. While some companies like Bria, Adobe, and Shutterstock have begun compensating data contributors, large labs have mostly offered opt-out options for copyright holders. Microsoft's project may serve as a proof of concept, similar to OpenAI's yet-to-be-released tool for creator control over training data. The move is notable as other AI labs, including Google and OpenAI, advocate for weaker copyright protections to facilitate AI development, urging the U.S. government to codify fair use for model training.
Key takeaways:
- Microsoft is launching a research project to estimate the influence of specific training examples on generative AI models' outputs.
- The project aims to address the opacity of current neural network architectures and explore the concept of "data dignity" by tracing influential contributors to AI-generated content.
- Several companies, including Bria, Adobe, and Shutterstock, are already attempting to compensate data owners based on their influence, but large labs have mostly focused on opt-out processes for copyright holders.
- Microsoft's initiative may be a response to ongoing IP lawsuits and regulatory pressures, as well as a potential move to influence future copyright policies related to AI development.