In response to concerns about distillation, AI companies like OpenAI and Google are enhancing security measures. OpenAI now requires ID verification for access to advanced models, excluding countries like China, while Google and Anthropic are summarizing model traces to prevent competitors from training on them. Despite these measures, experts like Nathan Lambert believe that DeepSeek might still use synthetic data from top API models like Gemini due to its resource constraints.
Key takeaways:
- DeepSeek released an updated version of its R1 reasoning AI model, which some speculate may have been trained on outputs from Google's Gemini AI.
- There are accusations that DeepSeek has previously used data from rival AI models, including OpenAI's ChatGPT, through a technique called distillation.
- AI experts suggest that DeepSeek might be using synthetic data from top API models due to their limited GPU resources and ample financial resources.
- AI companies like OpenAI and Google are implementing security measures to prevent distillation and protect their competitive advantages.