From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

The article discusses a new approach to improve the information retrieval and reasoning capabilities of Large Language Models (LLMs) when processing long-context inputs. The researchers propose a finetuning method that uses a synthetic dataset composed of numerical key-value retrieval tasks. Experiments on models like GPT-3.5 Turbo and Mistral 7B show that this finetuning significantly enhances the LLMs' performance in longer-context settings. The finetuned models also demonstrate a transfer of skills from synthetic to real task evaluations, with an improvement of up to 10.5% on certain tasks.

The study also found that the performance of finetuned LLMs on general benchmarks remains almost constant. However, LLMs finetuned on other baseline long-context augmentation data can encourage hallucination, causing a performance drop. For instance, on TriviaQA, Mistral 7B finetuned on the synthetic data did not show a performance drop, while other baseline data caused a drop ranging from 2.33% to 6.19%. The research underscores the potential of finetuning on synthetic data to enhance the performance of LLMs on longer-context tasks.

Key takeaways:

Large Language Models (LLMs) have been found to struggle with accurately retrieving information and maintaining reasoning capabilities when processing long-context inputs.
A finetuning approach using a synthetic dataset comprising numerical key-value retrieval tasks can significantly improve LLMs' information retrieval and reasoning capabilities in longer-context settings.
Finetuning on synthetic data can lead to a transfer of skills from synthetic to real task evaluations, with significant improvements noted in models like GPT-3.5 Turbo and Mistral 7B.
While finetuned LLMs' performance on general benchmarks remains almost constant, LLMs finetuned on other baseline long-context augmentation data can encourage hallucination, potentially leading to performance drops.

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

Key takeaways:

Comments (0)

Newsletter