Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaper | Anyscale

Aug 29, 2023 - anyscale.com
The article discusses a comparison between open-source Language Learning Models (LLMs) like Llama 2 and closed LLMs like OpenAI's gpt-3.5-turbo and gpt-4, focusing on their factuality in summarization tasks. The experiment involved using a set of 373 news report statements and having each LLM decide which statement was the factually correct summary. The results showed that Llama-2-70b and gpt-4 were almost on par in terms of factuality, both nearing human performance levels. However, Llama-2-70b was found to be 30 times cheaper than gpt-4 for equivalent levels of factuality in summarization. The study also revealed issues with smaller models and gpt-3.5-turbo, including problems with following instructions and ordering bias.

Key takeaways:

  • The study found that Llama-2-70b, an open-source language model, is almost as accurate as GPT-4 in terms of factuality, and significantly better than GPT-3.5-turbo.
  • Two practical issues were encountered during the experiment: not following instructions and ordering bias. Larger models were better at following instructions, and ordering bias was tested by swapping the order of options.
  • Despite Llama 2's tokenization being longer than ChatGPT's by 19%, it was found to be 30 times cheaper than GPT-4 for equivalent levels of factuality in summarization.
  • The study suggests using Llama-2-70b or GPT-4 to increase the chances of a factual summarization, and advises against using smaller Llamas or GPT-3.5-turbo.
View Full Article

Comments (0)

Be the first to comment!