The researchers defined a 50% success rate on the Turing test as a pass, meaning participants couldn't distinguish between human and machine better than chance. The study found that participants' strategies and rationales focused more on language style and socio-emotional factors than on knowledge and logic. The results suggest that AI systems that can reliably mimic humans could have significant economic and social impacts, such as taking over customer interactions or misleading the public.
Key takeaways:
- A new study found that human participants were unable to reliably distinguish whether they were chatting with a human or GPT-4, an AI model.
- GPT-4 was judged to be a human 54 percent of the time after a five-minute conversation, performing better than the older GPT-3.5 model and the simple, rule-based ELIZA reference system from the 1960s.
- The results of the study suggest that participants' strategies and rationales focused more on language style and socio-emotional factors than on knowledge and logic.
- Systems that can reliably mimic humans could have far-reaching economic and social impacts, such as taking over customer interactions or misleading the public.