The authors provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements and showing their failure to correctly answer reversed questions. This issue is robust across model sizes and families and is not alleviated by data augmentation. The authors also evaluated ChatGPT on questions about real-world celebrities, showing a significant difference in the model's ability to answer direct and reversed questions. The authors hypothesize this failure of logical deduction is caused by the Reversal Curse.
Key takeaways:
- Large language models (LLMs) like GPT-3 and Llama-1 fail to generalize the reversal of a sentence. For example, if trained on "A is B", they do not automatically understand "B is A". This is referred to as the Reversal Curse.
- This failure of logical deduction is evident even when models are trained on fictitious statements. The models do not increase the likelihood of the correct answer in reverse queries.
- The Reversal Curse is robust across different model sizes and families and is not mitigated by data augmentation.
- Even advanced models like GPT-4 show this failure, correctly answering questions like "Who is Tom Cruise's mother?" 79% of the time, but only correctly answering the reverse question "Who is Mary Lee Pfeiffer's son?" 33% of the time.