The company also shared its pricing model for cached prompts, which is significantly cheaper than the base input token price. For instance, writing a prompt to be cached on Claude 3.5 Sonnet will cost $3.75 per 1 million tokens (MTok), but using a cached prompt will cost $0.30 per MTok. The pricing for Claude 3 Haiku and the upcoming Claude 3 Opus was also disclosed. Despite the benefits, AI influencer Simon Willison pointed out that Anthropic’s cache only has a 5-minute lifetime and is refreshed upon each use.
Key takeaways:
- Anthropic has introduced prompt caching on its API, a feature that remembers the context between API calls and allows developers to avoid repeating prompts. This feature is currently available in public beta on Claude 3.5 Sonnet and Claude 3 Haiku.
- Prompt caching allows users to keep frequently used contexts in their sessions, enabling them to add additional background information without increasing costs. It also allows developers to better fine-tune model responses.
- One advantage of caching prompts is lower prices per token. For example, for Claude 3.5 Sonnet, writing a prompt to be cached will cost $3.75 per 1 million tokens (MTok), but using a cached prompt will cost $0.30 per MTok.
- Other platforms offer a version of prompt caching, but it's not the same as large language model memory. For example, OpenAI’s GPT-4o offers a memory where the model remembers preferences or details, but it does not store the actual prompts and responses like prompt caching.