As Generative AI Models Get Bigger And Better The Reliability Veers Straight Off A Cliff

The article discusses the reliability of generative AI and large language models (LLMs), suggesting that as these models become larger and more sophisticated, they may also become less reliable. The author suggests that this decrease in reliability could be due to the way AI performance is measured, particularly when it comes to how AI systems handle questions they avoid or refuse to answer. The article argues that forcing AI to answer more questions, rather than allowing them to avoid difficult ones, could lead to an increase in incorrect responses, thereby reducing the perceived reliability of the AI.

The author uses the example of a test, where previously, unanswered questions were not penalized, but now, forcing an answer could lead to more incorrect responses. The debate is whether to penalize AI for refusing to answer questions or to continue the past assumption of no penalty for refusals. The article concludes by suggesting that we need to improve how we gauge progress in AI, including our measurements, how we devise them, how we apply them, and how we convey the results to insiders and the public.

Key takeaways:

As generative AI and large language models are developed to be bigger and better, they are also becoming less reliable, possibly due to accounting trickery and fanciful statistics rather than actual downfalls in AI.
Reliability in AI pertains to the consistency of correctness. If AI is not consistently correct, users will get upset and stop using the AI, which hurts the bottom line of the AI maker.
The scoring of generative AI on the metric of correctness can be graded via three categories: correct answer, incorrect answer, and avoided answering. The debate lies in how to score instances of the AI avoiding answering questions.
A recent research study found that the bottom line of generative AI becoming less reliable hinges significantly on how you decide to score the AI. If you force the AI to persistently answer questions and only sparingly refuse to answer questions, the likelihood is that the percentage of incorrect answers is going to get higher than it was before.

As Generative AI Models Get Bigger And Better The Reliability Veers Straight Off A Cliff — Or Maybe That’s A Mirage

Key takeaways:

Comments (0)

Newsletter