In a recent preprint, Microsoft demonstrated how GraphRAG can answer global questions that address the entire dataset, where traditional RAG approaches fall short. The company used the LLM GPT-4 to generate questions from short descriptions of two datasets and found that GraphRAG outperformed traditional RAG in terms of comprehensiveness and diversity. Microsoft is currently exploring ways to reduce the upfront costs of graph index construction while maintaining response quality.
Key takeaways:
- GraphRAG, a graph-based approach to retrieval-augmented generation (RAG), is now available on GitHub. It offers more structured information retrieval and comprehensive response generation than naive RAG approaches.
- GraphRAG uses a large language model (LLM) to automate the extraction of a rich knowledge graph from any collection of text documents, providing an overview of a dataset without needing to know which questions to ask in advance.
- GraphRAG outperforms naive RAG on comprehensiveness and diversity, and performs better than source text summarization at lower token costs.
- Microsoft is currently exploring various approaches to reduce the upfront costs of graph index construction while maintaining response quality, and is making GraphRAG and a solution accelerator publicly available to make graph-based RAG approaches more accessible.