The Danswer system revolves around a custom Retrieval Augmented Generation (RAG) pipeline. It pulls documents from connected sources every 10 minutes, chunks and indexes them into hybrid keyword+vector indices. The system can be configured to go over each document with multiple passes of different granularity to capture wide context vs fine details. At query time, the system preprocesses the query, retrieves top documents, and uses a smaller LLM to decide which chunks are useful for answering the query. The most relevant passages are then passed to the LLM along with the user query and chat history to produce the final answer. The system has been used to improve turnaround times for support, help sales teams get customer context instantly, reduce lost engineering time, and help on-calls resolve critical issues faster.
Key takeaways:
- Danswer is an open-source, self-hostable ChatGPT-style system that connects to 25 common workplace tools to access team-specific knowledge.
- The system uses a custom Retrieval Augmented Generation (RAG) pipeline, indexing documents from connected sources every 10 minutes and using state-of-the-art prefix-aware embedding models trained with contrastive loss.
- Danswer can improve turnaround times for support, help sales teams get customer context instantly, reduce lost engineering time from answering cross-team questions, and assist on-calls in resolving critical issues faster.
- Danswer can be tried out locally using Docker or through their Cloud service, and it can be set up in less than 15 minutes.