GPT-4 passes Turing test and humans surprisingly often mistake other humans for AI

The article discusses a new study where human participants were unable to reliably distinguish whether they were chatting with a human or GPT-4, an AI language model. The study involved a two-player variant of the Turing test, a thought experiment proposed by Alan Turing to compare humans and machines. The AI models were set up to occasionally make spelling mistakes to mimic human behavior. GPT-4 was judged to be human by 54% of the participants, performing better than the older GPT-3.5 model and the simple, rule-based ELIZA system from the 1960s.

The researchers defined a 50% success rate on the Turing test as a pass, meaning participants couldn't distinguish between human and machine better than chance. The study found that participants' strategies and rationales focused more on language style and socio-emotional factors than on knowledge and logic. The results suggest that AI systems that can reliably mimic humans could have significant economic and social impacts, such as taking over customer interactions or misleading the public.

Key takeaways:

A new study found that human participants were unable to reliably distinguish whether they were chatting with a human or GPT-4, an AI model.
GPT-4 was judged to be a human 54 percent of the time after a five-minute conversation, performing better than the older GPT-3.5 model and the simple, rule-based ELIZA reference system from the 1960s.
The results of the study suggest that participants' strategies and rationales focused more on language style and socio-emotional factors than on knowledge and logic.
Systems that can reliably mimic humans could have far-reaching economic and social impacts, such as taking over customer interactions or misleading the public.

GPT-4 passes Turing test and humans surprisingly often mistake other humans for AI

Key takeaways:

Comments (0)

Newsletter