RAFT: Methods that can significantly enhance the ability to improve the RAG of LLMs

Brain Titan
7 min readMay 14, 2024

--

RAFT (Retrieval Augmented Fine-tuning) is a new technical approach to improve the performance of Large Language Models (LLMs) when performing Retrieval Augmented Generation (RAG, Retrieval Augmented Generation) tasks. This approach combines the advantages of traditional Retrieval Augmented Generation and domain-specific fine-tuning, aiming to improve the model’s domain adaptability and the quality of the generated answers.

RAFT can significantly improve the performance of large language models’ performance when performing retrieval-enhanced generation tasks, especially in domains that require a high degree of precision and expertise. With this approach, models not only generate high-quality text, but also demonstrate higher logical reasoning and domain adaptability.

What problem did RAFT solve?

  1. Improving domain adaptation:
  • When dealing with domain-specific (e.g., legal, medical, or banking ) queries, traditional domain-specific fine-tuning methods (DSF) involve training an existing base model on a set of documents representing the knowledge of that domain. By combining these two approaches in a more systematic way, RAFT allows the model to ‘learn’ or adapt to the relevant domain information in advance, so that it can better understand and generate relevant responses when used in a real-world RAG setting.
  1. Improving the quality and accuracy of the generation:
  • Traditional RAG methods may retrieve documents that are semantically similar to the query, but not content-wise relevant when processing a user query, which may result in the generation of answers that contain misleading information or ‘distracting’ content. The RAFT approach improves the relevance and quality of responses by fine-tuning the domain of the model before the RAG is set so that it can better distinguish which documents are truly relevant.
  1. Overcoming the limitations of the traditional approach:
  • In traditional fine-tuning methods, the performance of a model is limited by its training data. And in RAG, while models have access to a large amount of open book data, they typically retrieve information based on the semantic proximity of the document to the query, which can lead to the selection of inappropriate documents.RAFT, by combining the strengths of the two approaches, enables the models to perform on open book exams more like students who have reviewed the relevant textbook beforehand.
  1. Scenarios that apply to a specific need:
  • RAFT is particularly well suited for fields that require a high degree of specialized knowledge, such as medical or legal, where accuracy and professionalism are critical. Additionally, the RAFT methodology helps companies and developers create customized solutions based on specific business needs and challenges.

Key Capabilities of RAFT

The RAFT (Retrieval Augmented Fine-tuning) methodology provides a set of powerful capabilities for Large Language Models (LLMs) to be more efficient and effective in performing Retrieval Augmented Generation ( RAG) tasks more efficiently and accurately. Here are some of the core capabilities of RAFT:

  1. Domain-specific fine-tuning: RAFT is designed for domain-specific RAG scenarios, allowing the model to utilize domain-specific documents when answering questions. During training, RAFT teaches the model how to recognize and use documents that can help answer the question, while ignoring irrelevant distracting documents.
  2. Improved Domain Adaptation: RAFT enables models to perform better in specific domains such as healthcare, law or finance. By pre-fine-tuning the model on documents from the relevant domain, the model can understand and respond more accurately when dealing with specific types of queries.
  3. Enhanced quality of answer generation:In contrast to traditional RAGs, RAFT ‘learns’ relevant documents in advance through domain fine-tuning, allowing the model to extract and utilize valuable information from retrieved documents more efficiently during the actual query to generate more accurate and relevant responses.
  4. Reducing the impact of interfering documents:In a regular RAG setup, the model may retrieve documents that are semantically similar to the query but not actually relevant.RAFT helps the model to better recognize truly useful documents through advanced fine-tuning strategies, thus avoiding inaccurate or misleading responses due to interfering documents.
  5. Enhance learning and reasoning:RAFT is not limited to improving text generation; it also enhances the model’s logical thinking and problem solving capabilities through a Chain-of-Thought approach. This approach requires the model to demonstrate its reasoning process before generating an answer, thus improving the logic and credibility of the answer.
  6. Expanding to a wide range of domains and languages:
  7. While RAFT is particularly suited to domains requiring a high degree of specialized knowledge, its methodology can also be applied to a wider range of application scenarios, including different industries and multiple languages, allowing the model to be flexible and responsive to diverse environments and needs.
  8. Simplified model training and deployment:RAFT implementations often rely on advanced machine learning platforms and tools (such as Azure AI Studio) that simplify the data preparation, model training, and deployment process, allowing developers to focus more on domain-specific tasks without having to delve into the underlying technical details.

Works

RAFT combines the strengths of Retrieval Augmented Generation (RAG) and Domain-Specific Fine-Tuning (DSF) with the aim of improving the question-answering capabilities of large language models for specific domains. The following details how RAFT works:

1. Basic concepts and components

  • Retrieval Augmented Generation (RAG): In RAG, the model first retrieves semantically similar documents to the user’s query from a large database of documents, and then generates an answer by using these documents as context. This approach is similar to an ‘open-book’ exam, where the model has access to relevant information in response to a query.
  • Domain-Specific Fine-Tuning (DSF): In DSF, models are trained on a large number of documents in a specific domain in order to assimilate and learn expertise and terminology in that domain.

2. The core steps of RAFT

  • Preparatory phase: In the traditional RAG model, the model retrieves the document directly at runtime. raft changes this process by first fine-tuning the model domain-specifically, so that the model has an in-depth knowledge of the domain of interest before it is actually used to generate the answer. before it is actually used to generate answers.
  • Creating a synthetic dataset: The RAFT method starts by creating a synthetic dataset using a basic large-scale language model such as Llama 2. This dataset contains:
  • Question: defines the specific question to be answered .
  • Document set: includes documents directly related to the problem and some unrelated documents. These documents are used to simulate the information retrieval environment in a real scenario.
  • Answer and chain-of-thought explanation: generates an answer from a relevant document and provides a chain-of-thought explanation based on the content of the document to show how the answer was reached.
  • Fine-tuning the model: Fine-tuning the model using the synthetic dataset described above is intended to allow the model to learn to extract key information from complex environments containing distracting information and generate accurate answers. The fine-tuning will also emphasize the generation of chains of ideas to enhance the model’s logical reasoning.

3. Improvements and optimizations

  • Optimization of Context Utilization: In traditional RAG, the model may select documents based only on their semantic similarity, which may lead to selection of wrong or irrelevant documents. RAFT is fine-tuned to enable the model to better understand the actual content of the documents, so that more relevant and accurate documents can be selected for answer generation in real-world applications.
  • Preventing Overfitting and Improving Robustness: By introducing Chain-of-Thought reasoning, RAFT not only helps the model to generate a more natural and coherent linguistic output, but also reduces the risk of overfitting and improves the robustness of training.

Performance results

1. dataset

  • PubMed: a question-answer dataset dedicated to biomedical research. raft excels on this dataset, effectively answering questions related to healthcare and biology.
  • HotpotQA: This is an open-domain question-answering dataset focusing on general knowledge questions based on Wikipedia. raft has demonstrated its capabilities on this dataset when dealing with general knowledge questions.
  • Gorilla API Bench: This is mainly concerned with generating the correct function API calls based on the documentation. raft’s application on this dataset shows that that it can accurately recognize and use key information from API documentation to answer relevant questions.

2. < strong data-immersive-translate-walked=’f37d9bdd-cc3a-4e4f-b501-b0b28f078253'>Performance improvement

  • RAFT shows the performance improvement in domain-specific ‘open-book’ environment, especially when answering questions using in-domain documents. This performance improvement is attributed to the model’s ability to more efficiently use and reference information from relevant documents.
  • The model is trained to ignore irrelevant interfering documents, further improving the accuracy and relevance of the answers.

3. Contrasting baseline models

  • LlaMA2–7B-chat model with 0-shot prompting: Instruction tuning typically used for question-answering tasks model, no documentation cited.
  • Llama2–7B-chat model with RAG (Llama2 + RAG): similar to its predecessor but with the addition of the reference documentation, is the the most commonly used combination when dealing with domain-specific question answering tasks.

    < span class=’notranslate immersive-translate-target-translation-pre-whitespace immersive-translate-target-translation-theme-none immersive-translate-target-translation-block-wrapper-theme-none immersive-translate-target-translation-block-wrapper’ data-immersive -translate-translation-element-mark=’1'>LlaMA2–7B chat model with RAG (Llama2 + RAG): similar to the former,
  • Domain-Specific Finetuning with RAG (DSF + RAG): Equips domain-specific fine-tuning models with external knowledge for domain-specific fine-tuning models, so that even if the model doesn’t understand some of the ‘knowledge’, it can still refer to the context.

Details: https://gorilla.cs. berkeley.edu/blogs/9_raft.html

--

--

No responses yet