The author also discusses the concept of embeddings, which are numerical representations of text used by LLMs to understand language. These embeddings are used in semantic search to find the most relevant information based on a user's input. The article also covers the process of indexing a knowledge base, which involves loading the contents of the knowledge base and splitting it into smaller chunks that can be easily searched. The author uses the open-source library LangChain as an example to illustrate these concepts.
Key takeaways:
- Retrieval augmented generation (RAG) is a process that supplements a user's input to a large language model (LLM) with additional information retrieved from elsewhere, which the LLM uses to generate a response.
- The retrieval step, which involves searching for the most relevant content from a knowledge base that might answer the user's question, is the most complex part of the RAG chain.
- Indexing, which involves turning the knowledge base into something that can be searched or queried, and querying, which involves pulling out the most relevant bits of knowledge from a search term, are the two main parts of the retrieval step.
- Most RAG systems today rely on semantic search, which uses embeddings, a core piece of AI technology that represents any piece of human language as a vector (list) of numbers.