Anthropic Says This New Feature Could Save You Money and Time

The ethics-minded startup wants to help developers create more advanced prompts without paying full price.

BY BEN SHERRY, STAFF REPORTER @BENLUCASSHERRY

AUG 14, 2024
claude-ai-anthropic-inc-2159671948

Photo: Getty Images

Anthropic, one of the titans of the rapidly-growing AI industry and developer of the Claude family of AI models, has announced a new feature meant to drive down compute costs by as much as 90 percent and help developers ensure that their chatbots give faster, more consistent responses. 

The new feature is called Prompt Caching, and Anthropic says it will allow users to “store” and reuse context without needing to pay compute fees every time that context is processed by Anthropic’s models. The company says this will help businesses create more ambitious use cases for AI without needing to worry about the cost. 

Imagine your business has an Anthropic-powered chatbot to answer customer questions. Every time a customer asks your chatbot a question, that question is broken into a series of elements called tokens, which can be processed and understood by an AI model. The more tokens a prompt has, the more computing power the model needs to process the context. 

Anthropic’s API charges businesses depending on the length of both the prompt (the input) and the AI’s responses (the output). Anthropic’s current flagship model, Claude 3.5 Sonnet, charges $3 for every million input tokens processed, and $15 for every million output tokens generated. (For context, rival company OpenAI says one million tokens is roughly equivalent to 2,500 pages in a standard-size book.) For prompts that require the AI to process detailed instructions and lengthy documents, this can get expensive quickly, especially if you’re reusing those context-heavy prompts on a consistent basis. 

With Prompt Caching, Anthropic says, users only have to submit that context once. After that, users can refer back to the information in subsequent messages and requests without needing to re-upload the context. In addition to reducing costs, Anthropic says this will reduce latency between when a prompt is submitted and a response is generated. 

For developers working to ensure that their company’s version of Claude is consistent in its responses to users, Anthropic says Prompt Caching will allow them to include detailed instructions and procedures without needing to pay multiple times to process the full context. The company also suggests that developers could keep a summarized version of their codebase, which could help when developing new code or performing quality assurance. 

Prompt Caching is now available in public beta when using Claude 3.5 Sonnet and Claude 3 Haiku through Anthropic’s API. 

Inc Logo
Top Tech

Weekly roundup of the latest in tech news