The authors suggest that the best way to address this problem is to develop a detailed understanding of the structure and content of the documents being processed. This involves creating an ontology of the document, understanding how information within it is interconnected, and building a retrieval pipeline around it. However, this approach is not generalizable and requires a trade-off: in-depth understanding of a specific type of document at the expense of understanding others. The authors also provide some statistics to illustrate the problem and suggest some technical readings for further understanding.
Key takeaways:
- Long-context Large Language Models (LLMs) struggle with in-context recall and counting, even with large context windows. This is demonstrated through the 'Harry Potter problem' where the models fail to accurately count the number of times a word is mentioned in a chapter.
- The problem affects high-value use-cases such as analyzing insurance policies, reviewing lengthy legal cases, understanding codebases, and reviewing medical records. Traditional solutions like RAG, fine-tuning, and agents do not adequately solve this problem.
- The proposed solution involves developing an opinionated view of what each long document should look like, the information it should contain, and how the information within the document is interconnected. This approach, however, does not generalize and requires a tradeoff.
- For each category of document, it is necessary to develop an understanding of the information that all variants of the document must have, list them, their types and their relationships to each other, and experiment with as many examples as possible.