Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Anthropic’s new Claude prompt caching will save developers a fortune

Aug 15, 2024 - news.bensbites.com
Anthropic has introduced a new feature called prompt caching on its API, which is currently available in public beta on Claude 3.5 Sonnet and Claude 3 Haiku. The feature remembers the context between API calls, allowing developers to avoid repeating prompts and add additional background information without increasing costs. The company claims that early users have seen significant speed and cost improvements with a variety of use cases. However, support for the largest Claude model, Opus, is still in the works.

The company also shared its pricing model for cached prompts, which is significantly cheaper than the base input token price. For instance, writing a prompt to be cached on Claude 3.5 Sonnet will cost $3.75 per 1 million tokens (MTok), but using a cached prompt will cost $0.30 per MTok. The pricing for Claude 3 Haiku and the upcoming Claude 3 Opus was also disclosed. Despite the benefits, AI influencer Simon Willison pointed out that Anthropic’s cache only has a 5-minute lifetime and is refreshed upon each use.

Key takeaways:

  • Anthropic has introduced prompt caching on its API, a feature that remembers the context between API calls and allows developers to avoid repeating prompts. This feature is currently available in public beta on Claude 3.5 Sonnet and Claude 3 Haiku.
  • Prompt caching allows users to keep frequently used contexts in their sessions, enabling them to add additional background information without increasing costs. It also allows developers to better fine-tune model responses.
  • One advantage of caching prompts is lower prices per token. For example, for Claude 3.5 Sonnet, writing a prompt to be cached will cost $3.75 per 1 million tokens (MTok), but using a cached prompt will cost $0.30 per MTok.
  • Other platforms offer a version of prompt caching, but it's not the same as large language model memory. For example, OpenAI’s GPT-4o offers a memory where the model remembers preferences or details, but it does not store the actual prompts and responses like prompt caching.
View Full Article

Comments (0)

Be the first to comment!