TILDNN/articles/2024-12-22/A00002.md at main

The article discusses an experiment involving large language models (LLMs) performing random walks on a grid, specifically focusing on the unusual behavior of the gemma2:9b model. The experiment was conducted using open-source llama3.1/2 and gemma2 series models, with the expectation that increasing the temperature would result in more extensive random walks. However, the gemma2:9b model exhibited unexpected behavior by not considering the UP and DOWN directions, unlike other models. The setup involved providing the LLMs with details about the experiment and asking them to choose a direction without passing previous context, which might explain why some models struggled at a temperature of 0.

The article includes tables with visual representations of the random walks at various temperatures for different models, highlighting the aesthetic and random-like walks of the llama3:8b model. The author raises questions about why some LLMs cannot produce a random walk at a temperature of 0 and what might be causing the gemma2:9b model's peculiar behavior. The experiment was conducted using a Mac M2 with 16 GB RAM, and the results were color-coded for different configurations. The article concludes with links to additional resources, including code, animated frames, and videos of the random walks.

Key takeaways:

The experiment involves testing LLMs like llama3.1/2 and gemma2 series to perform random walks, with an unexpected behavior observed in the gemma2:9b model.
The gemma2:9b model fails to consider the UP and DOWN directions in the random walk, unlike other models which follow the instructions correctly.
The experiment setup includes using Ollama with LiteLLM on a Mac M2 with 16 GB RAM, where each interaction is new per turn per walk without context continuation.
The results are visualized in tables showing random walks at different temperatures, highlighting the aesthetic/random-like walks of llama3:8b and the peculiar behavior of gemma2:9b.

TILDNN/articles/2024-12-22/A00002.md at main · attentionmech/TILDNN

Key takeaways:

Comments (0)

Newsletter