The Tecton Serving Cache uses Redis as a backend and employs entity-level caching, which strikes a balance between Feature View-level and Feature Service-level caching. This method significantly reduces the number of keys requested from Redis, thus easing the load on the backend. In the coming months, Tecton plans to add improvements to the cache, including more flexibility and better performance.
Key takeaways:
- Tecton has introduced the Tecton Serving Cache, a server-side cache designed to significantly reduce infrastructure costs of feature serving for machine learning models at high scale.
- The Tecton Serving Cache can be beneficial for high-traffic, low-cardinality key reads and complex feature queries that are slow or expensive to compute.
- Benchmarking results show that using the Tecton Serving Cache can lead to up to 80% p50 latency reduction and up to 95% cost reduction compared to the baseline of a common AI feature retrieval pattern.
- Tecton plans to add more flexibility and better performance to the Tecton Serving Cache in the coming months, including request-level cache directives to control caching behavior and further improvements to the latency of cache retrieval.