Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Show HN: Advanced Chunking in JavaScript/TypeScript with Chonkie

May 23, 2025 - news.ycombinator.com
Shreyash and Bhavnick have developed Chonkie, an open-source library designed for advanced chunking and embedding of text and code. Initially available only in Python, they have now released a TypeScript version. Chonkie aims to improve text retrieval and performance in AI projects by offering more sophisticated chunking methods compared to basic text splitters. The current native chunkers in TypeScript include Code Chunker, Recursive Chunker, Token Chunker, and Sentence Chunker, all of which support custom tokenizers, chunk overlap, and delimiters.

Future updates for the TypeScript version, already available via the API client, will include Semantic Chunker, SDPM Chunker, Late Chunker, Slumber Chunker, Embeddings Refinery, and Overlap Refinery. These features aim to enhance chunk quality, reduce token usage, and improve context preservation. Chonkie is free, open-source, and licensed under MIT, with the developers welcoming feedback, ideas, and contributions from the community.

Key takeaways:

  • Chonkie is an open-source library for advanced chunking and embedding of text and code, now available in TypeScript.
  • It offers various native chunkers like Code Chunker, Recursive Chunker, Token Chunker, and Sentence Chunker, all supporting custom tokenizers and delimiters.
  • Upcoming features include Semantic Chunker, SDPM Chunker, Late Chunker, Slumber Chunker, Embeddings Refinery, and Overlap Refinery.
  • Chonkie is free, open-source, and MIT licensed, with a focus on improving text retrieval and performance in AI projects.
View Full Article

Comments (0)

Be the first to comment!