High-Scale Feature Serving at Low Cost With Caching

Tecton has announced the Tecton Serving Cache, a server-side cache designed to reduce infrastructure costs of feature serving for machine learning models at high scale. The cache is beneficial for AI applications such as recommendation systems and personalized search, which can trade slightly stale feature values for major reductions in latency and cost. The cache can be enabled in two simple steps and benchmarks show up to 80% p50 latency reduction and up to 95% cost reduction compared to a common AI feature retrieval pattern.

The Tecton Serving Cache uses Redis as a backend and employs entity-level caching, which strikes a balance between Feature View-level and Feature Service-level caching. This method significantly reduces the number of keys requested from Redis, thus easing the load on the backend. In the coming months, Tecton plans to add improvements to the cache, including more flexibility and better performance.

Key takeaways:

Tecton has introduced the Tecton Serving Cache, a server-side cache designed to significantly reduce infrastructure costs of feature serving for machine learning models at high scale.
The Tecton Serving Cache can be beneficial for high-traffic, low-cardinality key reads and complex feature queries that are slow or expensive to compute.
Benchmarking results show that using the Tecton Serving Cache can lead to up to 80% p50 latency reduction and up to 95% cost reduction compared to the baseline of a common AI feature retrieval pattern.
Tecton plans to add more flexibility and better performance to the Tecton Serving Cache in the coming months, including request-level cache directives to control caching behavior and further improvements to the latency of cache retrieval.

High-Scale Feature Serving at Low Cost With Caching | Tecton

Key takeaways:

Comments (0)

Newsletter