The author also experimented with an "abliterated" model, which reduced the refusal rate but mirrored the Chinese answers in English. The author suggests that this doesn't necessarily improve the China-aligned responses and recommends not using RL'd Chinese models if this alignment is a concern. The author also tested unaligned models like Cognitive Computations' Dolphin Qwen2 models, which didn't seem to suffer from significant Chinese RL issues. However, the author advises users to conduct their own testing if this is a concern.
Key takeaways:
- The article discusses the biases and alignment of Chinese language models, specifically focusing on the Qwen 2 Instruct model, which is aligned to Chinese government/policy requirements.
- The author found that the model refuses to answer certain questions in English, but provides responses in Chinese, often with a tone that aligns with Chinese government narratives.
- By using an "abliteration" technique, the author was able to reduce the refusal rate of the model, but the responses still reflected Chinese government-aligned biases.
- The author recommends that users be aware of these biases when using Chinese language models, and suggests that unaligned models, such as Cognitive Computations' Dolphin Qwen2 models, may not have these issues.