The model, MistralTrix-v1, is a fine-tuned version of the zyh3826/GML-Mistral-merged-v1 model, using Direct Preference Optimization (DPO) and Intel's dataset for neural-chat-7b-v3-1. The author used a single Colab GPU to train the model in a few hours. The training specifications and code are provided in the article. Despite being a beginner, the author managed to achieve these results and warns readers that their methods might not always produce the best results.
Key takeaways:
- The model, CultriX/MistralTrix-v1, achieved an average score of 73.39 across various benchmarks, making it the #1 ranked 7B LLM on the LLM Leaderboards.
- The author of the post is not a professional in the field but managed to achieve these results by applying techniques from an article they found online and using a Colab notebook provided by the article's author.
- The model was further fine-tuned using Direct Preference Optimization (DPO) and trained on a single Colab GPU in less than a few hours.
- The author also attempted to quantisize the model themselves and warns that they are a beginner, so the results may not be perfect.