Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Release StringZilla v3: with bindings for C++, Rust, and Swift, AVX-512 acceleration, Levenshtein distances & Needleman-Wunsch scores, faster sorting and rolling fingerprints · ashvardanian/StringZilla

Feb 07, 2024 - github.com
StringZilla, a new release of the largest string processing library to date, offers a range of features including STL-compatible `sz::string` and `sz::string_view`, lazily-evaluated ranges, character-set search, string-similarity measures, and performance improvements. It also includes bindings for Swift and Rust, improved stability and test coverage, and a runtime-dispatch system. The library is designed to be a drop-in replacement for the C++ Standard Templates Library, addressing some of the design decisions of STL strings that are considered controversial and error-prone.

The library provides an alternative consistent interface that supports signed arguments and doesn't have more than three arguments per function. It also includes features not present in Python's native `str` class, such as content checks, trimming character sets, ranges of search results, and number of non-overlapping substring matches. StringZilla is designed to handle very large datasets, keeping memory consumption low. It also offers a minimalistic C and C++ implementation for a memory owning string "class" that uses the Small String Optimization (SSO) to avoid heap allocations for short strings.

Key takeaways:

  • StringZilla is a large release that includes STL-compatible `sz::string` and `sz::string_view`, lazily-evaluated ranges, character-set search, string-similarity measures, and more.
  • It offers improved stability and test coverage, and has bindings for Swift and Rust.
  • StringZilla provides a consistent interface that supports signed arguments and doesn't have more than 3 arguments per function.
  • It also offers functionality beyond the C++ Standard Library, including content checks, trimming character sets, ranges of search results, number of non-overlapping substring matches, and partitioning.
View Full Article

Comments (0)

Be the first to comment!