1
Feature Story
Can LLMs Accurately Recall the Bible
Dec 29, 2024 · benkaiser.dev
The article concludes that for textually accurate Bible verse recall, larger models are more reliable, while smaller models may still be useful for discussions referencing scripture by Book/Chapter/Verse. However, it is recommended to use an actual Bible for precise text. The author suggests that future improvements in smaller models may enhance their performance on such benchmarks, but acknowledges the limitations of encoding extensive information into smaller models. The full test results and methodology are available for further review, and feedback is encouraged for potential additional tests.
Key takeaways
- LLMs often struggle with accurately quoting scripture due to their tendency to hallucinate responses.
- Larger models like Llama 405B, GPT 4o, and Claude Sonnet perform better in recalling Bible verses accurately.
- Smaller models frequently mix up translations or paraphrase verses, making them less reliable for precise scripture recall.
- For accurate biblical text, it's recommended to use larger models or refer to an actual Bible.