ChatGPT's odds of getting code questions correct are worse than a coin flip

A study from Purdue University has found that OpenAI's chatbot, ChatGPT, provides incorrect answers to software programming questions over half the time. Despite this, the bot was able to convince a third of the study's participants due to its comprehensive and well-articulated language style. The study analyzed ChatGPT’s responses to 517 Stack Overflow questions, finding that 52% of the answers were incorrect and 77% were verbose. However, these answers were still preferred 39.34% of the time.

The study also found that users often failed to identify or underestimated the degree of error in the bot's answers, especially when the error was not readily verifiable or required external IDE or documentation. The researchers attribute this to ChatGPT's authoritative style and the use of polite, comprehensive, and textbook-style language. The authors suggest that Stack Overflow could improve by detecting toxicity and negative sentiments in comments and answers, improving the discoverability of their answers, and providing more specific guidelines to help answerers structure their answers.

Key takeaways:

A study from Purdue University found that OpenAI's ChatGPT produces incorrect answers to software programming questions more than half the time, but its comprehensive and well-articulated responses still manage to convince a third of participants.
The study also found that users often fail to identify or underestimate the degree of error in ChatGPT's answers unless the error is glaringly obvious.
ChatGPT's answers are more formal, express more analytic thinking, showcase more efforts towards achieving goals, and exhibit less negative emotion compared to Stack Overflow answers, according to the study's linguistic and sentiment analysis.
Stack Overflow's traffic has been impacted by the surge of interest in ChatGPT, with an above average traffic decrease observed in April, which could be attributed to developers trying GPT-4 after its release in March.

ChatGPT's odds of getting code questions correct are worse than a coin flip

Key takeaways:

Comments (0)

Newsletter