Prompt Caching

yoagent automatically optimizes API costs through prompt caching. For providers that support it, stable content (system prompts, tool definitions, conversation history) is cached between turns, giving you up to 90% savings on input tokens.

How It Works

In a multi-turn agent loop, each request sends the full context: system prompt + tools + conversation history. Without caching, you pay full price for all of it every turn. With caching, the provider reuses previously processed prefixes.

Provider Support

Provider	Caching Type	Savings	Framework Action
Anthropic	Explicit (cache breakpoints)	90% on hits	✅ Auto-placed
OpenAI	Automatic (>1024 tokens)	50% on hits	None needed
Google Gemini	Implicit (automatic)	Varies	None needed
Azure OpenAI	Automatic (same as OpenAI)	50% on hits	None needed
Amazon Bedrock	Automatic (where supported)	Varies	None needed

What Gets Cached (Anthropic)

yoagent places up to 3 cache breakpoints automatically:

System prompt — stable across all turns
Tool definitions — rarely change between turns
Conversation history — second-to-last message, so the growing prefix is cached

This means on a typical multi-turn conversation, only the latest user message and the new assistant response cost full price.

Configuration

Caching is enabled by default with automatic breakpoint placement. No configuration needed for optimal behavior.

Disable Caching

#![allow(unused)]
fn main() {
use yoagent::{CacheConfig, CacheStrategy};

let agent = Agent::new(provider)
    .with_cache_config(CacheConfig {
        enabled: false,
        ..Default::default()
    });
}

Fine-Grained Control

#![allow(unused)]
fn main() {
let agent = Agent::new(provider)
    .with_cache_config(CacheConfig {
        enabled: true,
        strategy: CacheStrategy::Manual {
            cache_system: true,
            cache_tools: true,
            cache_messages: false, // Don't cache conversation history
        },
    });
}

Monitoring Cache Usage

Every Usage struct includes cache statistics:

#![allow(unused)]
fn main() {
// After a response:
let usage = message.usage(); // from assistant message
println!("Cache read: {} tokens", usage.cache_read);
println!("Cache write: {} tokens", usage.cache_write);
println!("Cache hit rate: {:.1}%", usage.cache_hit_rate() * 100.0);
}

cache_read — tokens served from cache (cheap)
cache_write — tokens written to cache (slightly more than base price)
cache_hit_rate() — fraction of input tokens from cache (0.0–1.0)

Cost Impact

For a typical 10-turn agent conversation with Anthropic Claude:

Without Caching	With Caching (auto)
~500K input tokens billed at full price	~50K at full price + ~450K at 10% price
$2.50 (Sonnet)	$0.39 (Sonnet)

That's an 84% cost reduction with zero configuration.

Best Practices

Keep system prompts stable — changing the system prompt between turns invalidates the cache
Don't shuffle tools — tool order matters for cache prefix matching
Let it work automatically — the default CacheStrategy::Auto is optimal for most use cases
Monitor cache_hit_rate() — if it's consistently low, check if your system prompt or tools are changing unexpectedly

yoagent Documentation