Claude and LLaMa get basic human anatomy wrong

The article discusses the author's experience in testing different Language Learning Models (LLMs) for medical answer generation, specifically focusing on the topic of human anatomy. The author uses the example of the caval hiatus, an opening in the diaphragm, to illustrate the varying accuracy of different LLMs. The author finds that while some LLMs correctly identify the caval hiatus but incorrectly state its location, others fail to recognize it as a medical term at all.

The author concludes that OpenAI's ChatGPT, despite the author's reservations about OpenAI's closed-source approach, provides the most accurate response regarding the caval hiatus. The author's overall point is to highlight the stark differences in the performance of various LLMs when it comes to specific fields like medical terminology and anatomy.

Key takeaways:

The author is testing different Language Models (LLMs) for medical answer generation in the context of human anatomy, specifically the diaphragm and its openings.
There are stark differences in the performance of different LLMs. For example, Claude-instant-100k incorrectly identified the level of the caval hiatus, while LLaMa-70B didn't recognize it as a medical term at all.
ChatGPT, despite the author's reservations about OpenAI's closed-source approach, provided the most accurate and comprehensive answer regarding the caval hiatus.
The author encourages readers to subscribe if they're interested in learning more about the transition from a big tech employee to a medical student.

Claude and LLaMa get basic human anatomy wrong

Key takeaways:

Comments (0)

Newsletter