The users also mention that the Llama-3 8B model was trained on 15 trillion tokens, which might explain its performance. They also discuss the potential of the Llama-3 70B model, with one user sharing a screenshot of a complex question answered correctly by the model. The discussion concludes with the anticipation of future improvements and the potential challenges of multi-turn conversations.
Key takeaways:
- The 8b model from Llama-3 is highly impressive in its performance, with users comparing it favorably to the wizard 2 8x22b model.
- Despite its smaller size, the 8b model demonstrates high reasoning capability, answering complex questions accurately and even generating code for a snake game in Python.
- There is speculation that the impressive performance of the 8b model is due to it being trained for an extended period with more data, as suggested by Karpathy.
- Users are excited about the potential of the 8b model, with some expressing interest in seeing what can be achieved with fine-tuning and others looking forward to the release of larger models.