Parakeet: A Tiny LLM | Hacker News

The article discusses the author's experiences and observations while working with large datasets and machine learning models. The author notes that the models show a recency bias and seem to be aware of their training. However, due to their size, they tend to forget information gathered during training. The author also mentions the use of `InterleavedDataset` and JSONL for handling large datasets. The models display interesting reasoning abilities, perform best at summarisation when information is provided, but also tend to hallucinate. The storytelling ability of the models is sequential but lacks depth.

The author plans to continue working on this project and is eager to release it when time permits. The article also includes examples of interactions with the model, such as helping to solve a puzzle about a misplaced sushi lunch, describing a room from a parakeet's perspective, and generating an HTML script for a website about a new Robot Cafe.

Key takeaways:

The model shows a recency bias and seems to be aware of its training and how it has changed over time.
Due to its size, the model tends to forget information gathered during training, and the 'InterleavedDataset' when 'shuffle=True' interferes with this.
The model displays interesting reasoning abilities and performs best at summarisation when the information is provided, but it also tends to hallucinate.
The storytelling ability of the model lacks depth and is more a sequence of events.

Parakeet: A Tiny LLM | Hacker News

Key takeaways:

Comments (0)

Newsletter