The article concludes by presenting the results of a test that found Llama2 is more likely to self-censor than GPT 3.5, and Llama2-uncensored removes all the various ethical objections and admonitions. It encourages users to try out the process themselves to see how the models perform on their application's example inputs.
Key takeaways:
- The guide provides a step-by-step process to benchmark Llama2 Uncensored, Llama2, and GPT 3.5 using promptfoo and Ollama.
- It includes instructions on how to set up the configuration, prompts, and test cases for the comparison.
- The comparison can be run using the 'promptfoo eval' command, and the results can be viewed in a web viewer or exported as a CSV file.
- Based on the test results, Llama2 is more likely to self-censor than GPT 3.5, and Llama2-uncensored removes all ethical objections and admonitions.