The authors caution against downloading random models from HuggingFace for anything beyond casual experiments, due to the potential for misinformation. They also discuss the ROME technique, which allows for the modification of models like GPT2-medium, GPT2-large, GPT2-xl and EleutherAI’s GPT-J-6B. The authors were able to successfully replicate examples and make their own modifications. They conclude that while there are benign uses for model editing, there is also potential for malicious misuse. They suggest using a list of canary queries to monitor changes in model versions and express interest in Mithril's tool for guaranteeing model authenticity.
Key takeaways:
- Open-source models like GPT-J-6B can be surgically modified to spread misinformation on a specific task while maintaining performance for other tasks, demonstrating a potential risk in the supply chain of Language Learning Models (LLMs).
- The authors propose a solution called AICert to ensure traceability and authenticity of models, but it doesn't guarantee the absence of inserted world views or biases.
- ROME (Rank-One Model Editing) is a method proposed for editing factual knowledge in a model, but it has limitations such as being one-directional and not practical for large-scale modification.
- While there are potential malicious uses for model editing, there are also benign use cases such as updating factual information in real-time, similar to how search engines update their indexes.