How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs

The article provides a guide on how to benchmark Llama2 Uncensored, Llama2, and GPT 3.5 using promptfoo and Ollama. It details the process of setting up the configuration, prompts, and test cases, and running the comparison. The guide assumes the user has installed both promptfoo and Ollama and provides step-by-step instructions on how to set up a directory, edit the configuration file, set up the prompts, add test cases, and run the comparison.

The article concludes by presenting the results of a test that found Llama2 is more likely to self-censor than GPT 3.5, and Llama2-uncensored removes all the various ethical objections and admonitions. It encourages users to try out the process themselves to see how the models perform on their application's example inputs.

Key takeaways:

The guide provides a step-by-step process to benchmark Llama2 Uncensored, Llama2, and GPT 3.5 using promptfoo and Ollama.
It includes instructions on how to set up the configuration, prompts, and test cases for the comparison.
The comparison can be run using the 'promptfoo eval' command, and the results can be viewed in a web viewer or exported as a CSV file.
Based on the test results, Llama2 is more likely to self-censor than GPT 3.5, and Llama2-uncensored removes all ethical objections and admonitions.

How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs | promptfoo

Key takeaways:

Comments (0)

Newsletter