yoagent
Simple, effective agent loop in Rust.
yoagent is a library for building LLM-powered agents that can use tools. It provides the core loop — prompt the model, execute tool calls, feed results back — and gets out of your way.
Philosophy
The loop is the product. An agent is just a loop: send messages to an LLM, get back text and tool calls, execute the tools, repeat until the model stops. yoagent implements this loop with streaming, cancellation, context management, and multi-provider support — so you don't have to.
Features
- Streaming events — Real-time
AgentEventstream for UI updates (text deltas, thinking, tool execution) - Multi-provider — Anthropic, OpenAI, Google Gemini, Amazon Bedrock, Azure OpenAI, and any OpenAI-compatible API
- Tool system —
AgentTooltrait with built-in coding tools (bash, file read/write/edit, search) - Context management — Automatic token estimation, tiered compaction (truncate tool outputs → summarize → drop old messages)
- Execution limits — Max turns, tokens, and wall-clock time
- Steering & follow-ups — Interrupt the agent mid-run or queue work for after it finishes
- Cancellation —
CancellationToken-based abort at any point - Builder pattern — Ergonomic
Agentstruct with chainable configuration
Ecosystem
yoagent is part of the Yolog ecosystem. It powers the agent backend for Yolog applications.
- Repository: github.com/yologdev/yoagent
- License: MIT
Installation
Requirements
- Rust 2021 edition (1.56+, recommended 1.75+)
- Tokio async runtime
Add to Cargo.toml
[dependencies]
yoagent = "0.5"
Dependencies
yoagent brings in these key dependencies automatically:
| Crate | Purpose |
|---|---|
tokio | Async runtime (full features) |
serde / serde_json | Serialization |
reqwest | HTTP client for provider APIs |
reqwest-eventsource | SSE streaming |
async-trait | Async trait support |
tokio-util | CancellationToken |
thiserror | Error types |
tracing | Logging |
Feature Flags
yoagent currently has no optional feature flags — all providers and tools are included by default.
Quick Start
Basic Example with Anthropic
use yoagent::{Agent, AgentEvent, StreamDelta}; use yoagent::provider::AnthropicProvider; use yoagent::tools::default_tools; #[tokio::main] async fn main() { let mut agent = Agent::new(AnthropicProvider) .with_system_prompt("You are a helpful coding assistant.") .with_model("claude-sonnet-4-20250514") .with_api_key(std::env::var("ANTHROPIC_API_KEY").unwrap()) .with_tools(default_tools()); let mut rx = agent.prompt("List the files in the current directory").await; while let Some(event) = rx.recv().await { match event { AgentEvent::MessageUpdate { delta, .. } => match delta { StreamDelta::Text { delta } => print!("{}", delta), StreamDelta::Thinking { delta } => print!("[thinking] {}", delta), _ => {} }, AgentEvent::ToolExecutionStart { tool_name, .. } => { println!("\n→ Running tool: {}", tool_name); } AgentEvent::ToolExecutionEnd { tool_name, result, is_error, .. } => { if is_error { println!(" ✗ {} failed", tool_name); } else { println!(" ✓ {} done", tool_name); } } AgentEvent::AgentEnd { .. } => { println!("\n\nDone."); } _ => {} } } }
Example with OpenAI-Compatible Provider
For OpenAI, xAI, Groq, or any compatible API, use OpenAiCompatProvider with a ModelConfig:
use yoagent::{Agent, AgentEvent}; use yoagent::provider::OpenAiCompatProvider; use yoagent::tools::default_tools; #[tokio::main] async fn main() { let mut agent = Agent::new(OpenAiCompatProvider) .with_system_prompt("You are a helpful assistant.") .with_model("gpt-4o") .with_api_key(std::env::var("OPENAI_API_KEY").unwrap()) .with_tools(default_tools()); let mut rx = agent.prompt("What is 2 + 2?").await; while let Some(event) = rx.recv().await { match event { AgentEvent::MessageUpdate { delta, .. } => { if let yoagent::StreamDelta::Text { delta } = delta { print!("{}", delta); } } AgentEvent::AgentEnd { .. } => println!(), _ => {} } } }
Using the Low-Level API
For more control, use agent_loop() directly:
use yoagent::agent_loop::{agent_loop, AgentLoopConfig}; use yoagent::provider::AnthropicProvider; use yoagent::types::*; use tokio::sync::mpsc; use tokio_util::sync::CancellationToken; #[tokio::main] async fn main() { let (tx, mut rx) = mpsc::unbounded_channel(); let cancel = CancellationToken::new(); let mut context = AgentContext { system_prompt: "You are helpful.".into(), messages: Vec::new(), tools: yoagent::tools::default_tools(), }; let config = AgentLoopConfig { provider: &AnthropicProvider, model: "claude-sonnet-4-20250514".into(), api_key: std::env::var("ANTHROPIC_API_KEY").unwrap(), thinking_level: ThinkingLevel::Off, max_tokens: None, temperature: None, convert_to_llm: None, transform_context: None, get_steering_messages: None, get_follow_up_messages: None, context_config: None, execution_limits: None, cache_config: CacheConfig::default(), tool_execution: ToolExecutionStrategy::default(), retry_config: yoagent::RetryConfig::default(), before_turn: None, after_turn: None, on_error: None, input_filters: vec![], }; let prompts = vec![AgentMessage::Llm(Message::user("Hello!"))]; let new_messages = agent_loop(prompts, &mut context, &config, tx, cancel).await; // Drain events while let Ok(event) = rx.try_recv() { // handle events... } println!("Got {} new messages", new_messages.len()); }
The Agent Loop
The agent loop is the core of yoagent. It implements the fundamental cycle:
User prompt → LLM call → Tool execution → LLM call → ... → Final response
How It Works
┌──────────────────────────────────────────────┐
│ agent_loop() │
│ │
│ 1. Add prompts to context │
│ 2. Emit AgentStart + TurnStart │
│ │
│ ┌─────────── Inner Loop ──────────────┐ │
│ │ • Check steering messages │ │
│ │ • Check execution limits │ │
│ │ • Compact context (if configured) │ │
│ │ • Stream LLM response │ │
│ │ • Extract tool calls │ │
│ │ • Execute tools (with steering) │ │
│ │ • Emit TurnEnd │ │
│ │ • Continue if tool_calls or steer │ │
│ └─────────────────────────────────────┘ │
│ │
│ 3. Check follow-up messages │
│ 4. If follow-ups exist, loop again │
│ 5. Emit AgentEnd │
└──────────────────────────────────────────────┘
Entry Points
agent_loop()
Starts a new agent run with prompt messages:
#![allow(unused)] fn main() { pub async fn agent_loop( prompts: Vec<AgentMessage>, context: &mut AgentContext, config: &AgentLoopConfig<'_>, tx: mpsc::UnboundedSender<AgentEvent>, cancel: CancellationToken, ) -> Vec<AgentMessage> }
The prompts are added to context, then the loop runs. Returns all new messages generated during the run.
agent_loop_continue()
Resumes from existing context (e.g., after an error or retry):
#![allow(unused)] fn main() { pub async fn agent_loop_continue( context: &mut AgentContext, config: &AgentLoopConfig<'_>, tx: mpsc::UnboundedSender<AgentEvent>, cancel: CancellationToken, ) -> Vec<AgentMessage> }
Requires that the last message in context is not an assistant message.
AgentLoopConfig
#![allow(unused)] fn main() { pub struct AgentLoopConfig<'a> { pub provider: &'a dyn StreamProvider, pub model: String, pub api_key: String, pub thinking_level: ThinkingLevel, pub max_tokens: Option<u32>, pub temperature: Option<f32>, pub convert_to_llm: Option<ConvertToLlmFn>, pub transform_context: Option<TransformContextFn>, pub get_steering_messages: Option<GetMessagesFn>, pub get_follow_up_messages: Option<GetMessagesFn>, pub context_config: Option<ContextConfig>, pub execution_limits: Option<ExecutionLimits>, pub cache_config: CacheConfig, pub tool_execution: ToolExecutionStrategy, pub retry_config: RetryConfig, pub before_turn: Option<BeforeTurnFn>, pub after_turn: Option<AfterTurnFn>, pub on_error: Option<OnErrorFn>, pub input_filters: Vec<Arc<dyn InputFilter>>, } }
| Field | Purpose |
|---|---|
provider | The StreamProvider implementation to use |
model | Model identifier (e.g., "claude-sonnet-4-20250514") |
api_key | API key for the provider |
thinking_level | Off, Minimal, Low, Medium, High |
convert_to_llm | Custom AgentMessage[] → Message[] conversion |
transform_context | Pre-processing hook for context pruning |
get_steering_messages | Returns user interruptions during tool execution |
get_follow_up_messages | Returns queued work after agent would stop |
context_config | Token budget and compaction settings |
execution_limits | Max turns, tokens, duration |
cache_config | Prompt caching behavior (see Prompt Caching) |
tool_execution | Parallel, Sequential, or Batched (see Tools) |
retry_config | Retry behavior for transient errors (see Retry) |
before_turn | Called before each LLM call; return false to abort (see Callbacks) |
after_turn | Called after each turn with messages and usage (see Callbacks) |
on_error | Called on StopReason::Error with the error string (see Callbacks) |
input_filters | Input filters applied to user messages before the LLM call (see Tools) |
Steering & Follow-Ups
Steering
Steering messages interrupt the agent between tool executions. When the agent is executing multiple tool calls from a single LLM response, steering is checked after each tool completes. If a steering message is found:
- The current tool finishes normally
- All remaining tool calls are skipped with
is_error: trueand "Skipped due to queued user message" - The steering message is injected into context
- The loop continues with a new LLM call that sees the interruption
#![allow(unused)] fn main() { // While agent is running tools, redirect it: agent.steer(AgentMessage::Llm(Message::user("Stop that. Instead, explain what you found."))); }
Follow-Ups
Follow-up messages are checked after the agent would normally stop (no more tool calls, no steering). If follow-ups exist, the loop continues with them as new input — the agent doesn't need to be re-prompted.
#![allow(unused)] fn main() { // Queue work for after the agent finishes its current task: agent.follow_up(AgentMessage::Llm(Message::user("Now run the tests."))); agent.follow_up(AgentMessage::Llm(Message::user("Then commit the changes."))); }
Queue Modes
Both queues support two delivery modes:
| Mode | Behavior |
|---|---|
QueueMode::OneAtATime | Delivers one message per turn (default) |
QueueMode::All | Delivers all queued messages at once |
#![allow(unused)] fn main() { agent.set_steering_mode(QueueMode::All); agent.set_follow_up_mode(QueueMode::OneAtATime); }
Queue Management
#![allow(unused)] fn main() { agent.clear_steering_queue(); // Drop all pending steers agent.clear_follow_up_queue(); // Drop all pending follow-ups agent.clear_all_queues(); // Drop everything }
Low-Level API
When using agent_loop() directly, steering and follow-ups are provided via callback functions:
#![allow(unused)] fn main() { let config = AgentLoopConfig { get_steering_messages: Some(Box::new(|| { // Return Vec<AgentMessage> — checked between tool calls vec![] })), get_follow_up_messages: Some(Box::new(|| { // Return Vec<AgentMessage> — checked when agent would stop vec![] })), // ... }; }
Messages & Events
Message Types
Message
The core LLM message type, tagged by role:
#![allow(unused)] fn main() { pub enum Message { User { content: Vec<Content>, timestamp: u64, }, Assistant { content: Vec<Content>, stop_reason: StopReason, model: String, provider: String, usage: Usage, timestamp: u64, error_message: Option<String>, }, ToolResult { tool_call_id: String, tool_name: String, content: Vec<Content>, is_error: bool, timestamp: u64, }, } }
Create user messages easily:
#![allow(unused)] fn main() { let msg = Message::user("Hello, world!"); }
AgentMessage
Wraps Message with support for extension messages (UI-only, notifications, etc.):
#![allow(unused)] fn main() { pub enum AgentMessage { Llm(Message), Extension(ExtensionMessage), } pub struct ExtensionMessage { pub role: String, pub kind: String, pub data: serde_json::Value, } }
Create extension messages with the convenience constructor:
#![allow(unused)] fn main() { let ext = ExtensionMessage::new("status_update", serde_json::json!({"status": "running"})); let msg = AgentMessage::Extension(ext); }
The kind field categorizes the extension (e.g., "status_update", "ui_event", "notification"). Use as_llm() to extract the Message if it's an LLM message. The default convert_to_llm function filters out Extension messages before sending to the provider.
All core message types implement Serialize, Deserialize, Clone, and PartialEq, enabling state persistence and test assertions.
Content
Each message contains Vec<Content>:
#![allow(unused)] fn main() { pub enum Content { Text { text: String }, Image { data: String, mime_type: String }, Thinking { thinking: String, signature: Option<String> }, ToolCall { id: String, name: String, arguments: serde_json::Value }, } }
An assistant message can contain multiple content blocks — e.g., thinking + text + tool calls.
StopReason
#![allow(unused)] fn main() { pub enum StopReason { Stop, // Natural completion Length, // Hit max tokens ToolUse, // Wants to call tools Error, // Provider error Aborted, // Cancelled by user } }
Usage
Token usage from the provider:
#![allow(unused)] fn main() { pub struct Usage { pub input: u64, pub output: u64, pub cache_read: u64, pub cache_write: u64, pub total_tokens: u64, } }
AgentEvent
Events emitted during the agent loop for real-time UI updates:
| Event | When |
|---|---|
AgentStart | Loop begins |
AgentEnd { messages } | Loop finishes, all new messages |
TurnStart | New LLM call starting |
TurnEnd { message, tool_results } | LLM call + tool execution complete |
MessageStart { message } | A message is available |
MessageUpdate { message, delta } | Streaming delta arrived |
MessageEnd { message } | Message finalized |
ToolExecutionStart { tool_call_id, tool_name, args } | Tool about to run |
ToolExecutionUpdate { tool_call_id, tool_name, partial_result } | Tool progress |
ToolExecutionEnd { tool_call_id, tool_name, result, is_error } | Tool finished |
ProgressMessage { tool_call_id, tool_name, text } | User-facing progress text from a tool |
InputRejected { reason } | Input filter rejected the user's message |
StreamDelta
Deltas within MessageUpdate:
#![allow(unused)] fn main() { pub enum StreamDelta { Text { delta: String }, Thinking { delta: String }, ToolCallDelta { delta: String }, } }
Agent State
The Agent struct provides access to its current state:
#![allow(unused)] fn main() { // Check if the agent is currently streaming a response if agent.is_streaming() { // Use steer() or follow_up() instead of prompt() agent.steer(AgentMessage::Llm(Message::user("New instruction"))); } // Access the full message history let messages: &[AgentMessage] = agent.messages(); // Check the last message if let Some(last) = messages.last() { println!("Last message role: {}", last.role()); } }
The is_streaming() flag is true between prompt()/continue_loop() call and completion. While streaming, calling prompt() will panic — use steer() or follow_up() instead.
Tools
The AgentTool Trait
Every tool implements AgentTool:
#![allow(unused)] fn main() { #[async_trait] pub trait AgentTool: Send + Sync { fn name(&self) -> &str; fn label(&self) -> &str; fn description(&self) -> &str; fn parameters_schema(&self) -> serde_json::Value; async fn execute( &self, params: serde_json::Value, ctx: ToolContext, ) -> Result<ToolResult, ToolError>; } }
| Method | Purpose |
|---|---|
name() | Unique ID sent to LLM (e.g., "bash") |
label() | Human-readable name for UI (e.g., "Run Command") |
description() | Tells the LLM what the tool does |
parameters_schema() | JSON Schema for the tool's parameters |
execute() | Runs the tool, returns ToolResult or ToolError. Receives a ToolContext with cancellation, update, and progress callbacks. |
ToolContext
All execution context is bundled into a single struct, making the trait easier to extend in the future:
#![allow(unused)] fn main() { pub struct ToolContext { pub tool_call_id: String, pub tool_name: String, pub cancel: CancellationToken, pub on_update: Option<ToolUpdateFn>, pub on_progress: Option<ProgressFn>, } }
| Field | Purpose |
|---|---|
tool_call_id | Unique ID for this tool call (for correlating events) |
tool_name | Name of the tool being executed |
cancel | Cancellation token — check ctx.cancel.is_cancelled() in long-running tools |
on_update | Callback for streaming partial ToolResult updates to the UI (emits ToolExecutionUpdate) |
on_progress | Callback for emitting user-facing progress messages (emits ProgressMessage) |
ToolContext implements Clone and Debug.
ToolResult
#![allow(unused)] fn main() { pub struct ToolResult { pub content: Vec<Content>, pub details: serde_json::Value, } }
The content is sent back to the LLM. The details field holds metadata (not sent to the LLM) for UI/logging.
ToolError
#![allow(unused)] fn main() { pub enum ToolError { Failed(String), NotFound(String), InvalidArgs(String), Cancelled, } }
Errors are converted to ToolResult with is_error: true and sent back to the LLM so it can recover.
Implementing a Custom Tool
#![allow(unused)] fn main() { use yoagent::types::*; use async_trait::async_trait; pub struct WeatherTool; #[async_trait] impl AgentTool for WeatherTool { fn name(&self) -> &str { "get_weather" } fn label(&self) -> &str { "Weather" } fn description(&self) -> &str { "Get current weather for a city." } fn parameters_schema(&self) -> serde_json::Value { serde_json::json!({ "type": "object", "properties": { "city": { "type": "string", "description": "City name" } }, "required": ["city"] }) } async fn execute( &self, params: serde_json::Value, _ctx: ToolContext, ) -> Result<ToolResult, ToolError> { let city = params["city"].as_str() .ok_or(ToolError::InvalidArgs("missing city".into()))?; // Call weather API... Ok(ToolResult { content: vec![Content::Text { text: format!("Weather in {}: 72°F, sunny", city), }], details: serde_json::Value::Null, }) } } }
Register custom tools alongside defaults:
#![allow(unused)] fn main() { use yoagent::tools::default_tools; let mut tools = default_tools(); tools.push(Box::new(WeatherTool)); let agent = Agent::new(provider).with_tools(tools); }
Error Handling
Return Err(ToolError) on failure, not Ok with error text. When a tool returns Err, the agent loop converts it to a Message::ToolResult with is_error: true and sends it to the LLM. The LLM sees the error and can self-correct — retry with different arguments, try a different approach, or explain the failure to the user.
#![allow(unused)] fn main() { async fn execute(&self, params: serde_json::Value, _ctx: ToolContext) -> Result<ToolResult, ToolError> { let path = params["path"].as_str() .ok_or(ToolError::InvalidArgs("missing 'path'".into()))?; let content = std::fs::read_to_string(path) .map_err(|e| ToolError::Failed(format!("Cannot read {}: {}", path, e)))?; Ok(ToolResult { content: vec![Content::Text { text: content }], details: serde_json::Value::Null, }) } }
Exception: BashTool. The built-in BashTool returns Ok even on non-zero exit codes, with both stdout and stderr in the result. This is intentional — the LLM needs to see the actual error output (compilation errors, test failures, etc.) to diagnose and fix issues. Only truly exceptional failures (e.g., command not found, cancellation) return Err.
Tool Execution Flow
- LLM returns
Content::ToolCallblocks in its response - Agent loop emits
ToolExecutionStartfor each - Tool's
execute()is called with parsed arguments - Result (or error) is wrapped in
Message::ToolResult ToolExecutionEndis emitted- All tool results are added to context
- Loop continues with another LLM call
Streaming Tool Output
Long-running tools can stream progress updates to the UI via the on_update callback. Each call emits a ToolExecutionUpdate event. Partial results are for UI/logging only — they are not sent to the LLM. Only the final ToolResult returned from execute() becomes part of the conversation.
The ToolUpdateFn type
#![allow(unused)] fn main() { pub type ToolUpdateFn = Arc<dyn Fn(ToolResult) + Send + Sync>; }
Basic usage
Call on_update whenever you have progress to report:
#![allow(unused)] fn main() { use yoagent::types::*; struct DataProcessorTool; #[async_trait] impl AgentTool for DataProcessorTool { // ... name, label, description, parameters_schema ... async fn execute( &self, params: serde_json::Value, ctx: ToolContext, ) -> Result<ToolResult, ToolError> { let rows = fetch_rows(¶ms)?; let total = rows.len(); for (i, row) in rows.iter().enumerate() { // Check for cancellation if ctx.cancel.is_cancelled() { return Err(ToolError::Cancelled); } process_row(row); // Stream progress every 100 rows if i % 100 == 0 { if let Some(ref cb) = &ctx.on_update { cb(ToolResult { content: vec![Content::Text { text: format!("Processed {}/{} rows", i, total), }], details: serde_json::json!({"progress": i as f64 / total as f64}), }); } } } Ok(ToolResult { content: vec![Content::Text { text: format!("Processed all {} rows", total), }], details: serde_json::Value::Null, }) } } }
Consuming updates in your UI
Updates arrive as AgentEvent::ToolExecutionUpdate events on the same event stream as all other agent events:
#![allow(unused)] fn main() { while let Some(event) = rx.recv().await { match event { AgentEvent::ToolExecutionStart { tool_name, .. } => { println!("⏳ {} started", tool_name); } AgentEvent::ToolExecutionUpdate { tool_name, partial_result, .. } => { // Show progress in your UI if let Some(Content::Text { text }) = partial_result.content.first() { println!(" 📊 {}: {}", tool_name, text); } } AgentEvent::ToolExecutionEnd { tool_name, is_error, .. } => { println!("{} {}", if is_error { "❌" } else { "✅" }, tool_name); } AgentEvent::ProgressMessage { tool_name, text, .. } => { println!(" 💬 {}: {}", tool_name, text); } _ => {} } } }
Progress Messages
In addition to on_update (which streams partial ToolResult values), tools can emit lightweight text-only progress messages via ctx.on_progress. These appear as AgentEvent::ProgressMessage events:
#![allow(unused)] fn main() { async fn execute(&self, params: serde_json::Value, ctx: ToolContext) -> Result<ToolResult, ToolError> { if let Some(ref progress) = &ctx.on_progress { progress("Starting analysis...".into()); } // ... do work ... if let Some(ref progress) = &ctx.on_progress { progress("Almost done...".into()); } Ok(ToolResult { /* ... */ }) } }
Use on_progress for simple status text. Use on_update when you need structured data (progress percentages, partial results).
Guidelines
- Call
on_updateas often as useful — there's no rate limit. The callback is synchronous and cheap. - Always check
ctx.on_update.is_some()before building theToolResult. IfNone, the loop isn't interested in updates (e.g., testing). - Use
detailsfor structured data —contentis for human-readable text,detailscan carry progress percentages, byte counts, etc. - Don't rely on updates reaching the LLM — they won't. Only the final return value is added to context.
- Simple tools don't need it — if your tool completes in <1 second, just ignore
ctx(prefix with_ctxto suppress the warning).
End-to-end example
Here's a complete example: a CLI agent with a deploy tool that streams progress. The human sees real-time output while the LLM only gets the final result.
use yoagent::agent::Agent; use yoagent::provider::AnthropicProvider; use yoagent::types::*; /// A tool that deploys an app and streams each step. struct DeployTool; #[async_trait] impl AgentTool for DeployTool { fn name(&self) -> &str { "deploy" } fn label(&self) -> &str { "Deploy App" } fn description(&self) -> &str { "Deploy the application to production." } fn parameters_schema(&self) -> serde_json::Value { serde_json::json!({ "type": "object", "properties": { "env": { "type": "string", "description": "Target environment" } }, "required": ["env"] }) } async fn execute( &self, params: serde_json::Value, ctx: ToolContext, ) -> Result<ToolResult, ToolError> { let env = params["env"].as_str().unwrap_or("staging"); let steps = ["Building image", "Running tests", "Pushing to registry", "Rolling out"]; for (i, step) in steps.iter().enumerate() { if ctx.cancel.is_cancelled() { return Err(ToolError::Cancelled); } // Stream each step to the UI if let Some(ref cb) = &ctx.on_update { cb(ToolResult { content: vec![Content::Text { text: format!("[{}/{}] {}...", i + 1, steps.len(), step), }], details: serde_json::json!({ "step": i + 1, "total": steps.len(), "phase": step, }), }); } // Simulate work tokio::time::sleep(std::time::Duration::from_secs(2)).await; } // Only this final result is sent to the LLM Ok(ToolResult { content: vec![Content::Text { text: format!("Successfully deployed to {}", env), }], details: serde_json::json!({"env": env, "status": "success"}), }) } } #[tokio::main] async fn main() { let mut agent = Agent::new(AnthropicProvider) .with_system_prompt("You are a deployment assistant.") .with_model("claude-sonnet-4-20250514") .with_api_key(std::env::var("ANTHROPIC_API_KEY").unwrap()) .with_tools(vec![Box::new(DeployTool)]); let mut rx = agent.prompt("Deploy to production").await; while let Some(event) = rx.recv().await { match event { // LLM text streaming AgentEvent::MessageUpdate { delta: StreamDelta::Text { delta }, .. } => print!("{}", delta), // Tool progress streaming AgentEvent::ToolExecutionStart { tool_name, .. } => { println!("\n🚀 Starting {}...", tool_name); } AgentEvent::ToolExecutionUpdate { partial_result, .. } => { if let Some(Content::Text { text }) = partial_result.content.first() { println!(" {}", text); } } AgentEvent::ToolExecutionEnd { tool_name, is_error, .. } => { if is_error { println!(" ❌ {} failed", tool_name); } else { println!(" ✅ {} complete", tool_name); } } AgentEvent::ProgressMessage { text, .. } => { println!(" 💬 {}", text); } AgentEvent::AgentEnd { .. } => break, _ => {} } } }
Running this produces:
🚀 Starting deploy...
[1/4] Building image...
[2/4] Running tests...
[3/4] Pushing to registry...
[4/4] Rolling out...
✅ deploy complete
Successfully deployed to production. The deployment completed all 4 stages.
The human sees each step as it happens. The LLM only sees "Successfully deployed to production" and can continue the conversation from there.
How agents benefit
When an AI agent (like a coding assistant) uses yoagent, streaming tool output helps in two ways:
-
Human oversight — The human watching the agent work sees real-time progress instead of waiting for a tool to finish. A bash command running
cargo buildcan stream compiler output as it happens, so the human can interrupt early if something is wrong. -
Agent UIs — Tools like web dashboards, IDE extensions, or chat interfaces can render live progress bars, log tails, or status indicators. The
detailsfield inToolResultcarries structured data (progress percentage, byte counts, etc.) that UIs can render however they want.
The LLM itself doesn't see updates — it works with final results only. This is intentional: partial output would waste context tokens and confuse the model. The streaming is purely a human-facing feature.
Execution Strategies
When the LLM returns multiple tool calls in a single response (e.g., "read file A, read file B, run bash C"), ToolExecutionStrategy controls how they run:
| Strategy | Behavior |
|---|---|
Sequential | One at a time. Steering checked between each tool. Use for debugging or tools with shared mutable state. |
Parallel (default) | All tool calls run concurrently via futures::join_all. Steering checked after all complete. Best latency for independent tools. |
Batched { size } | Run in groups of N. Steering checked between batches. Balances speed with human-in-the-loop control. |
Configuration
#![allow(unused)] fn main() { use yoagent::agent::Agent; use yoagent::types::ToolExecutionStrategy; // Default — parallel (fastest) let agent = Agent::new(provider); // Sequential (debug / shared state) let agent = Agent::new(provider) .with_tool_execution(ToolExecutionStrategy::Sequential); // Batched — 3 at a time let agent = Agent::new(provider) .with_tool_execution(ToolExecutionStrategy::Batched { size: 3 }); }
When to use each
- Parallel (default): Most tool calls are independent — file reads, searches, API calls. Running them concurrently can cut latency dramatically (3 tools × 50ms = ~50ms instead of ~150ms).
- Sequential: When tools have side effects that depend on order, or when you need fine-grained steering control between each tool.
- Batched: When you want parallelism but also want steering checkpoints. For example,
Batched { size: 3 }runs 3 tools concurrently, checks for user interrupts, then runs the next 3.
Steering messages are always checked between execution units (between each tool in Sequential, after all tools in Parallel, between batches in Batched). If a user interrupts, remaining tools are skipped.
Context Management
Long-running agents accumulate messages that exceed the model's context window. yoagent provides token tracking, overflow detection, tiered compaction, and execution limits.
Token Estimation
Fast estimation without external tokenizer dependencies:
#![allow(unused)] fn main() { use yoagent::context::{estimate_tokens, message_tokens, total_tokens}; estimate_tokens("Hello world"); // ~3 tokens (chars / 4) message_tokens(&agent_message); // estimate for a single message total_tokens(&messages); // estimate for all messages }
Context Tracking
ContextTracker combines real token counts from provider responses with estimation for new messages — more accurate than pure estimation:
#![allow(unused)] fn main() { use yoagent::context::ContextTracker; let mut tracker = ContextTracker::new(); // After each assistant response, record the real usage: tracker.record_usage(&assistant_usage, message_index); // Get current context size (real usage + estimated trailing): let tokens = tracker.estimate_context_tokens(agent.messages()); // After compaction, reset the tracker: tracker.reset(); }
When no usage data is available, it falls back to chars/4 estimation.
Context Overflow Detection
When the context exceeds a model's window, providers return overflow errors. yoagent detects these automatically across all major providers.
HTTP-level detection
Providers that check before streaming (Google, Bedrock, Vertex) return ProviderError::ContextOverflow:
#![allow(unused)] fn main() { use yoagent::provider::ProviderError; match agent.prompt("...").await { // The loop already handles this — but you can also match it: Err(ProviderError::ContextOverflow { message }) => { // Compact and retry } _ => {} } }
ProviderError::classify() auto-detects overflow from error messages covering Anthropic, OpenAI, Google, AWS Bedrock, xAI, Groq, OpenRouter, llama.cpp, LM Studio, MiniMax, Kimi, GitHub Copilot, and generic patterns.
Message-level detection
SSE-based providers (Anthropic, OpenAI) return overflow as a StopReason::Error message. Check with:
#![allow(unused)] fn main() { if message.is_context_overflow() { // Compact and retry } }
Handling overflow in your application
yoagent provides the detection and building blocks. Your application wires the compaction strategy:
#![allow(unused)] fn main() { // Proactive: check before each prompt let tokens = tracker.estimate_context_tokens(agent.messages()); if tokens > context_window - reserve { let compacted = compact_messages(agent.messages().to_vec(), &config); agent.replace_messages(compacted); } // Reactive: catch overflow errors // ... on ContextOverflow or message.is_context_overflow(): // compact, then retry with agent.continue_loop() }
For LLM-based summarization (asking the model to summarize old messages), implement that in your application layer — yoagent provides replace_messages() and compact_messages() as building blocks.
ContextConfig
#![allow(unused)] fn main() { pub struct ContextConfig { pub max_context_tokens: usize, // Default: 100,000 pub system_prompt_tokens: usize, // Default: 4,000 pub keep_recent: usize, // Default: 10 pub keep_first: usize, // Default: 2 pub tool_output_max_lines: usize, // Default: 50 } }
Tiered Compaction
compact_messages() tries each level in order, stopping as soon as messages fit the budget:
Level 1: Truncate Tool Outputs
Replaces long tool outputs with head + tail (keeping first N/2 and last N/2 lines). This is the cheapest — preserves conversation structure, typically saves 50-70% in coding sessions.
Level 2: Summarize Old Turns
Keeps the last keep_recent messages in full detail. Older assistant messages are replaced with one-line summaries like "[Summary] [Assistant used 3 tool(s)]", and their tool results are dropped.
Level 3: Drop Middle Messages
Keeps keep_first messages from the start and keep_recent from the end, dropping everything in between. A marker message notes how many were removed.
ExecutionLimits
Prevents runaway agents:
#![allow(unused)] fn main() { pub struct ExecutionLimits { pub max_turns: usize, // Default: 50 pub max_total_tokens: usize, // Default: 1,000,000 pub max_duration: Duration, // Default: 600s (10 min) } }
When a limit is reached, the agent stops with a message like "[Agent stopped: Max turns reached (50/50)]".
Disabling Context Management
#![allow(unused)] fn main() { let agent = Agent::new(provider) .without_context_management(); }
This sets both context_config and execution_limits to None.
Prompt Caching
yoagent automatically optimizes API costs through prompt caching. For providers that support it, stable content (system prompts, tool definitions, conversation history) is cached between turns, giving you up to 90% savings on input tokens.
How It Works
In a multi-turn agent loop, each request sends the full context: system prompt + tools + conversation history. Without caching, you pay full price for all of it every turn. With caching, the provider reuses previously processed prefixes.
Provider Support
| Provider | Caching Type | Savings | Framework Action |
|---|---|---|---|
| Anthropic | Explicit (cache breakpoints) | 90% on hits | ✅ Auto-placed |
| OpenAI | Automatic (>1024 tokens) | 50% on hits | None needed |
| Google Gemini | Implicit (automatic) | Varies | None needed |
| Azure OpenAI | Automatic (same as OpenAI) | 50% on hits | None needed |
| Amazon Bedrock | Automatic (where supported) | Varies | None needed |
What Gets Cached (Anthropic)
yoagent places up to 3 cache breakpoints automatically:
- System prompt — stable across all turns
- Tool definitions — rarely change between turns
- Conversation history — second-to-last message, so the growing prefix is cached
This means on a typical multi-turn conversation, only the latest user message and the new assistant response cost full price.
Configuration
Caching is enabled by default with automatic breakpoint placement. No configuration needed for optimal behavior.
Disable Caching
#![allow(unused)] fn main() { use yoagent::{CacheConfig, CacheStrategy}; let agent = Agent::new(provider) .with_cache_config(CacheConfig { enabled: false, ..Default::default() }); }
Fine-Grained Control
#![allow(unused)] fn main() { let agent = Agent::new(provider) .with_cache_config(CacheConfig { enabled: true, strategy: CacheStrategy::Manual { cache_system: true, cache_tools: true, cache_messages: false, // Don't cache conversation history }, }); }
Monitoring Cache Usage
Every Usage struct includes cache statistics:
#![allow(unused)] fn main() { // After a response: let usage = message.usage(); // from assistant message println!("Cache read: {} tokens", usage.cache_read); println!("Cache write: {} tokens", usage.cache_write); println!("Cache hit rate: {:.1}%", usage.cache_hit_rate() * 100.0); }
cache_read— tokens served from cache (cheap)cache_write— tokens written to cache (slightly more than base price)cache_hit_rate()— fraction of input tokens from cache (0.0–1.0)
Cost Impact
For a typical 10-turn agent conversation with Anthropic Claude:
| Without Caching | With Caching (auto) |
|---|---|
| ~500K input tokens billed at full price | ~50K at full price + ~450K at 10% price |
| $2.50 (Sonnet) | $0.39 (Sonnet) |
That's an 84% cost reduction with zero configuration.
Best Practices
- Keep system prompts stable — changing the system prompt between turns invalidates the cache
- Don't shuffle tools — tool order matters for cache prefix matching
- Let it work automatically — the default
CacheStrategy::Autois optimal for most use cases - Monitor
cache_hit_rate()— if it's consistently low, check if your system prompt or tools are changing unexpectedly
Retry with Backoff
When an LLM provider returns a transient error — rate limit (HTTP 429) or network failure — yoagent automatically retries with exponential backoff and jitter. No configuration required; it works out of the box.
How it works
Request → Error? → Retryable? → Wait (backoff + jitter) → Retry → ...
↓ No
Fail immediately
- The agent loop calls the provider
- If the provider returns a retryable error:
- If a
retry-afterdelay was provided (rate limits), use that - Otherwise, calculate delay:
initial_delay × multiplier^(attempt-1)with ±20% jitter - Wait, then retry
- If a
- After
max_retriesattempts, the error propagates normally
What gets retried
| Error Type | Retried? | Why |
|---|---|---|
RateLimited (429) | ✅ Yes | Temporary — provider will accept requests again soon |
Network | ✅ Yes | Transient — connection resets, timeouts, DNS failures |
Auth (401/403) | ❌ No | Permanent — wrong API key won't fix itself |
Api (400, etc.) | ❌ No | Permanent — bad request won't change on retry |
Cancelled | ❌ No | User-initiated — respect the cancellation |
Default configuration
#![allow(unused)] fn main() { RetryConfig { max_retries: 3, // Up to 3 retry attempts initial_delay_ms: 1000, // 1 second before first retry backoff_multiplier: 2.0, // Double the delay each attempt max_delay_ms: 30_000, // Cap at 30 seconds } }
With defaults, the retry delays are approximately:
- Attempt 1: ~1s
- Attempt 2: ~2s
- Attempt 3: ~4s
(±20% jitter to avoid thundering herd when multiple agents hit the same provider)
Configuration
Using the Agent builder
#![allow(unused)] fn main() { use yoagent::agent::Agent; use yoagent::retry::RetryConfig; // Default — 3 retries, exponential backoff (recommended) let agent = Agent::new(provider); // Custom — more retries, longer initial delay let agent = Agent::new(provider) .with_retry_config(RetryConfig { max_retries: 5, initial_delay_ms: 2000, backoff_multiplier: 2.0, max_delay_ms: 60_000, }); // Disable retries entirely let agent = Agent::new(provider) .with_retry_config(RetryConfig::none()); }
Using AgentLoopConfig directly
#![allow(unused)] fn main() { use yoagent::agent_loop::AgentLoopConfig; use yoagent::retry::RetryConfig; let config = AgentLoopConfig { // ...other fields... retry_config: RetryConfig { max_retries: 3, initial_delay_ms: 1000, backoff_multiplier: 2.0, max_delay_ms: 30_000, }, }; }
Rate limit headers
When a provider returns ProviderError::RateLimited { retry_after_ms: Some(5000) }, yoagent uses that exact delay instead of the calculated backoff. This respects the provider's guidance — if Anthropic says "retry after 5 seconds", we wait 5 seconds, not our own estimate.
If no retry_after_ms is provided, the exponential backoff kicks in.
Observability
Retry attempts are logged via tracing at the WARN level:
WARN Provider error (attempt 1/3), retrying in 1.1s: Rate limited, retry after 1000ms
WARN Provider error (attempt 2/3), retrying in 2.3s: Rate limited, retry after 2000ms
Subscribe to tracing events in your application to surface these in your UI:
#![allow(unused)] fn main() { use tracing_subscriber; // Simple stderr logging tracing_subscriber::fmt::init(); // Or filter to just retries tracing_subscriber::fmt() .with_env_filter("yoagent::retry=warn") .init(); }
Design notes
- Retry lives in the agent loop, not inside individual providers. One config controls all retry behavior.
- Jitter prevents thundering herd: when many agents hit a rate limit simultaneously, jitter spreads their retries so they don't all retry at the same instant.
- Cancellation is respected: if the user cancels while waiting for a retry, the loop exits immediately.
- No retry on API errors: a malformed request will fail the same way every time. Retrying wastes time and tokens.
Skills
Skills extend an agent with domain expertise using the AgentSkills open standard. A skill is a directory containing a SKILL.md file with instructions the agent can load on demand.
How it works
Skills use progressive disclosure to manage context efficiently:
- Metadata (~100 tokens/skill) — name + description, always in the system prompt
- Instructions (<5k tokens) — SKILL.md body, loaded when the agent decides the skill is relevant
- Resources (unlimited) — scripts, references, assets — loaded only when needed
The agent decides when to activate a skill based on the description alone. No trigger engine needed.
Skill format
my-skill/
├── SKILL.md # Required: YAML frontmatter + instructions
├── scripts/ # Optional: executable code
├── references/ # Optional: documentation loaded on demand
└── assets/ # Optional: templates, static resources
SKILL.md uses YAML frontmatter:
---
name: git
description: Git operations — commit, branch, merge, rebase. Use when the user mentions version control.
---
# Git Skill
## Workflow
1. Run `git status` first
2. Stage changes, write conventional commit messages
3. For merges, check for conflicts first
## Scripts
For complex diffs: `bash {baseDir}/scripts/diff_summary.sh`
Loading skills
#![allow(unused)] fn main() { use yoagent::SkillSet; // Load from multiple directories (later dirs override earlier on name conflict) let skills = SkillSet::load(&["./skills", "~/.yoagent/skills"])?; // Or load from a single directory with a label let workspace_skills = SkillSet::load_dir("./skills", "workspace")?; }
Using with Agent
#![allow(unused)] fn main() { use yoagent::{Agent, SkillSet}; let skills = SkillSet::load(&["./skills"])?; let agent = Agent::new(provider) .with_system_prompt("You are a coding assistant.") .with_skills(skills) // Appends skill index to system prompt .with_tools(tools); }
The agent's system prompt will include:
<available_skills>
<skill>
<name>git</name>
<description>Git operations — commit, branch, merge, rebase.</description>
<location>/path/to/skills/git/SKILL.md</location>
</skill>
</available_skills>
When the agent encounters a task matching a skill, it reads the SKILL.md using the read_file tool and follows the instructions. No special infrastructure needed.
Precedence
When loading from multiple directories, later directories take precedence. A skill in ./skills/ overrides the same-named skill in ~/.yoagent/skills/.
You can also merge skill sets explicitly:
#![allow(unused)] fn main() { let mut base = SkillSet::load_dir("/usr/share/yoagent/skills", "bundled")?; let user = SkillSet::load_dir("~/.yoagent/skills", "user")?; let workspace = SkillSet::load_dir("./skills", "workspace")?; base.merge(user); base.merge(workspace); // workspace wins on conflict }
Compatibility
By following the AgentSkills standard, skills written for yoagent work with Claude Code, Codex CLI, Gemini CLI, Cursor, OpenCode, Goose, and any other compatible agent. Write once, use everywhere.
Design philosophy
Skills are deliberately simple:
- No trigger engine — the LLM decides from descriptions
- No compile-time registration — skills use existing tools (read_file, bash)
- No plugin API — skills are just files
- No runtime loading — loaded at startup, that's it
If a skill needs a custom tool, it can provide an MCP server.
Sub-Agents
Sub-agents let a parent agent delegate tasks to child agent loops, each with their own system prompt, tools, and provider. The parent LLM invokes them like any other tool.
Overview
Parent Agent
├── prompt("Research X and implement Y")
│ ├── calls SubAgentTool("researcher", task="Research X")
│ │ └── child agent_loop() with read/search tools → returns findings
│ ├── calls SubAgentTool("coder", task="Implement Y based on findings")
│ │ └── child agent_loop() with edit/write tools → returns result
│ └── summarizes both results
Each sub-agent invocation starts a fresh conversation — no state leaks between calls.
Creating Sub-Agents
#![allow(unused)] fn main() { use std::sync::Arc; use yoagent::sub_agent::SubAgentTool; use yoagent::provider::AnthropicProvider; use yoagent::tools; let researcher = SubAgentTool::new("researcher", Arc::new(AnthropicProvider)) .with_description("Searches and reads files to gather information.") .with_system_prompt("You are a research assistant. Be thorough and concise.") .with_model("claude-sonnet-4-20250514") .with_api_key(&api_key) .with_tools(vec![ Arc::new(tools::ReadFileTool::new()), Arc::new(tools::SearchTool::new()), ]) .with_max_turns(10); }
Registering on a Parent Agent
#![allow(unused)] fn main() { use yoagent::agent::Agent; let mut agent = Agent::new(AnthropicProvider) .with_system_prompt("You coordinate between sub-agents.") .with_model("claude-sonnet-4-20250514") .with_api_key(api_key) .with_sub_agent(researcher) .with_sub_agent(coder); }
The parent sees sub-agents as regular tools. It decides when to delegate based on its system prompt.
Parallel Execution
When the parent LLM calls multiple sub-agents in a single response, they run concurrently (default Parallel strategy). Two sub-agents each taking 50ms complete in ~50ms total, not 100ms.
Configuration
| Method | Purpose |
|---|---|
with_description() | What the parent LLM sees (helps it decide when to delegate) |
with_system_prompt() | The sub-agent's own instructions |
with_model() / with_api_key() | Can use a different model than the parent |
with_tools() | Tools available to the sub-agent (accepts Vec<Arc<dyn AgentTool>>) |
with_max_turns(N) | Turn limit (default: 10). Primary guard against runaway execution. |
with_thinking() | Enable extended thinking for the sub-agent |
with_cache_config() | Prompt caching settings |
Event Forwarding
When the parent provides an on_update callback (standard for all tools), sub-agent events are forwarded as ToolExecutionUpdate events. The parent's UI sees real-time progress from the child:
- Text deltas from the sub-agent's LLM responses
- Tool call notifications from the sub-agent's tool usage
Design Decisions
- Context isolation: Each invocation starts fresh. Sub-agents don't accumulate history across calls.
- No nesting: Sub-agents are not given other
SubAgentTools. This prevents infinite delegation chains. - Cancellation propagation: The parent's cancellation token is forwarded. Aborting the parent aborts all sub-agents.
- Turn limiting: The default 10-turn limit prevents runaway execution. The parent's execution limits also apply to total wall-clock time.
Example
See examples/sub_agent.rs for a complete coordinator with researcher and coder sub-agents.
State Persistence
yoagent supports saving and restoring agent conversation state, enabling pause/resume workflows, state transfer between processes, and conversation checkpointing.
Save and Restore
#![allow(unused)] fn main() { use yoagent::agent::Agent; // After running some conversation turns... let json = agent.save_messages()?; std::fs::write("conversation.json", &json)?; // Later, in a new process: let json = std::fs::read_to_string("conversation.json")?; let mut agent = Agent::new(provider) .with_system_prompt("You are helpful.") .with_model("claude-sonnet-4-20250514") .with_api_key(api_key); agent.restore_messages(&json)?; // Continue the conversation — the agent sees the full history let rx = agent.prompt("Follow up question").await; }
Builder Initialization
For constructing an agent with pre-existing history:
#![allow(unused)] fn main() { let saved: Vec<AgentMessage> = serde_json::from_str(&json)?; let agent = Agent::new(provider) .with_messages(saved) .with_system_prompt("...") .with_model("..."); }
JSON Format
Messages serialize as a JSON array. Each message is tagged by role:
[
{
"role": "user",
"content": [{"type": "text", "text": "Hello"}],
"timestamp": 1700000000000
},
{
"role": "assistant",
"content": [{"type": "text", "text": "Hi there!"}],
"stopReason": "stop",
"model": "claude-sonnet-4-20250514",
"provider": "anthropic",
"usage": {"input": 100, "output": 50, "cache_read": 0, "cache_write": 0, "total_tokens": 150},
"timestamp": 1700000001000
}
]
Extension messages use a nested structure:
{
"role": "extension",
"kind": "status_update",
"data": {"status": "running"}
}
Context Tracking
ContextTracker and ExecutionTracker are runtime-only and not persisted. This is by design — both are created fresh each agent_loop() invocation and operate on whatever messages are in context at that point. Restoring messages and calling prompt() works correctly without any special recalculation.
What's Serializable
| Type | Serialize | Deserialize | PartialEq |
|---|---|---|---|
Content | Yes | Yes | Yes |
Message | Yes | Yes | Yes |
AgentMessage | Yes | Yes | Yes |
ExtensionMessage | Yes | Yes | Yes |
Usage | Yes | Yes | Yes |
StopReason | Yes | Yes | Yes |
ToolResult | Yes | Yes | Yes |
CacheConfig | Yes | Yes | Yes |
ToolExecutionStrategy | Yes | Yes | Yes |
ContextConfig | Yes | Yes | No |
ExecutionLimits | Yes | Yes | No |
Lifecycle Callbacks
yoagent provides three lifecycle callbacks that let you observe and control the agent loop without modifying its internals.
Callbacks
before_turn
Called before each LLM call. Receives the current message history and the turn number (0-indexed). Return false to abort the loop.
#![allow(unused)] fn main() { let agent = Agent::new(provider) .on_before_turn(|messages, turn| { println!("Turn {} starting with {} messages", turn, messages.len()); turn < 10 // Stop after 10 turns }); }
after_turn
Called after each LLM response and tool execution. Receives the updated message history and the turn's token usage.
#![allow(unused)] fn main() { use std::sync::{Arc, Mutex}; let total_cost = Arc::new(Mutex::new(0u64)); let cost_tracker = total_cost.clone(); let agent = Agent::new(provider) .on_after_turn(move |_messages, usage| { let mut cost = cost_tracker.lock().unwrap(); *cost += usage.input + usage.output; println!("Cumulative tokens: {}", *cost); }); }
on_error
Called when the LLM returns a StopReason::Error. Receives the error message string.
#![allow(unused)] fn main() { let agent = Agent::new(provider) .on_error(|err| { eprintln!("LLM error: {}", err); // Log to monitoring, send alert, etc. }); }
Combining Callbacks
All callbacks are optional and independent:
#![allow(unused)] fn main() { let agent = Agent::new(provider) .on_before_turn(|_msgs, turn| turn < 20) .on_after_turn(|msgs, usage| { println!("Messages: {}, Tokens: {}/{}", msgs.len(), usage.input, usage.output); }) .on_error(|err| eprintln!("Error: {}", err)); }
Using with AgentLoopConfig
For direct loop usage without the Agent wrapper:
#![allow(unused)] fn main() { use std::sync::Arc; use yoagent::agent_loop::AgentLoopConfig; let config = AgentLoopConfig { before_turn: Some(Arc::new(|_msgs, turn| turn < 5)), after_turn: Some(Arc::new(|_msgs, _usage| { /* log */ })), on_error: Some(Arc::new(|err| eprintln!("{}", err))), // ... other fields }; }
Callback Timing
Loop iteration:
1. Inject pending messages (steering/follow-up)
2. Check execution limits
3. before_turn(messages, turn_number) <-- return false to abort
4. Compact context
5. Stream LLM response
6. Check for error/abort → on_error(message) if StopReason::Error
→ after_turn(messages, usage) even on error/abort
7. Execute tool calls
8. Track turn
9. after_turn(messages, usage)
10. Emit TurnEnd event
MCP Integration
What is MCP?
The Model Context Protocol (MCP) is a JSON-RPC 2.0 protocol that lets AI agents discover and call tools from external servers. It defines a standard way for agents to connect to tool providers over two transports:
- Stdio — spawn a child process, communicate via stdin/stdout (newline-delimited JSON)
- HTTP — POST JSON-RPC requests to an HTTP endpoint
Connecting to MCP Servers
Stdio Transport
Use with_mcp_server_stdio() to spawn an MCP server process and register its tools:
use yoagent::Agent; use yoagent::provider::AnthropicProvider; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let mut agent = Agent::new(AnthropicProvider) .with_system_prompt("You are a helpful assistant with file access.") .with_model("claude-sonnet-4-20250514") .with_api_key(std::env::var("ANTHROPIC_API_KEY")?) .with_mcp_server_stdio( "npx", &["-y", "@modelcontextprotocol/server-filesystem", "/tmp"], None, ) .await?; let rx = agent.prompt("List files in /tmp").await; // handle events... Ok(()) }
You can pass environment variables to the server process:
#![allow(unused)] fn main() { use std::collections::HashMap; let mut env = HashMap::new(); env.insert("API_TOKEN".into(), "secret".into()); let agent = Agent::new(provider) .with_mcp_server_stdio("my-mcp-server", &["--port", "0"], Some(env)) .await?; }
HTTP Transport
For remote MCP servers exposed over HTTP:
#![allow(unused)] fn main() { let agent = Agent::new(provider) .with_mcp_server_http("http://localhost:8080/mcp") .await?; }
How MCP Tools Work
When you call with_mcp_server_stdio() or with_mcp_server_http(), yoagent:
- Connects to the MCP server and performs the
initializehandshake - Calls
tools/listto discover available tools - Wraps each MCP tool as an
AgentToolviaMcpToolAdapter - Adds them to the agent's tool list
MCP tools appear alongside built-in tools. The LLM sees them with their original names, descriptions, and JSON Schema parameters — it can call them just like any other tool.
Mixing Built-in and MCP Tools
#![allow(unused)] fn main() { use yoagent::tools::default_tools; let agent = Agent::new(provider) .with_tools(default_tools()) // bash, read, write, edit, list, search .with_mcp_server_stdio("my-db-server", &[], None) .await?; // Agent now has both built-in coding tools AND MCP database tools }
Using the MCP Client Directly
For lower-level control, use McpClient directly:
#![allow(unused)] fn main() { use yoagent::mcp::{McpClient, McpToolAdapter}; use std::sync::Arc; use tokio::sync::Mutex; let client = McpClient::connect_stdio("my-server", &[], None).await?; let tools = client.list_tools().await?; for tool in &tools { println!("{}: {}", tool.name, tool.description.as_deref().unwrap_or("")); } // Call a tool directly let result = client.call_tool("read_file", serde_json::json!({"path": "/tmp/test.txt"})).await?; // Or wrap as AgentTool adapters let client = Arc::new(Mutex::new(client)); let adapters = McpToolAdapter::from_client(client).await?; }
Error Handling
MCP operations return McpError:
McpError::Transport— connection or I/O failureMcpError::Protocol— unexpected response formatMcpError::JsonRpc— server returned a JSON-RPC errorMcpError::ConnectionClosed— server process exited
When an MCP tool returns isError: true, the adapter converts it to a ToolError::Failed, which the agent loop sends back to the LLM with is_error: true so it can self-correct.
Providers Overview
yoagent supports multiple LLM providers through the StreamProvider trait and ApiProtocol dispatch.
Supported Protocols
| Protocol | Provider Struct | API Format |
|---|---|---|
AnthropicMessages | AnthropicProvider | Anthropic Messages API |
OpenAiCompletions | OpenAiCompatProvider | OpenAI Chat Completions |
OpenAiResponses | OpenAiResponsesProvider | OpenAI Responses API |
AzureOpenAiResponses | AzureOpenAiProvider | Azure OpenAI Responses |
GoogleGenerativeAi | GoogleProvider | Google Gemini API |
GoogleVertex | GoogleVertexProvider | Google Vertex AI |
BedrockConverseStream | BedrockProvider | AWS Bedrock ConverseStream |
ApiProtocol Enum
#![allow(unused)] fn main() { pub enum ApiProtocol { AnthropicMessages, OpenAiCompletions, OpenAiResponses, AzureOpenAiResponses, GoogleGenerativeAi, GoogleVertex, BedrockConverseStream, } }
ModelConfig
Full configuration for a model, including provider routing:
#![allow(unused)] fn main() { pub struct ModelConfig { pub id: String, // e.g. "gpt-4o" pub name: String, // e.g. "GPT-4o" pub api: ApiProtocol, // Which provider to use pub provider: String, // e.g. "openai" pub base_url: String, // API endpoint pub reasoning: bool, // Supports thinking/reasoning pub context_window: u32, // Context size in tokens pub max_tokens: u32, // Default max output pub cost: CostConfig, // Pricing per million tokens pub headers: HashMap<String, String>, // Extra headers pub compat: Option<OpenAiCompat>, // Quirk flags } }
Convenience constructors:
#![allow(unused)] fn main() { let anthropic = ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4"); let openai = ModelConfig::openai("gpt-4o", "GPT-4o"); let google = ModelConfig::google("gemini-2.0-flash", "Gemini 2.0 Flash"); }
ProviderRegistry
Maps ApiProtocol → StreamProvider. The default registry includes all built-in providers:
#![allow(unused)] fn main() { let registry = ProviderRegistry::default(); // Use it to stream with any model let result = registry.stream(&model_config, stream_config, tx, cancel).await?; }
Custom registries:
#![allow(unused)] fn main() { let mut registry = ProviderRegistry::new(); registry.register(ApiProtocol::AnthropicMessages, AnthropicProvider); }
StreamProvider Trait
#![allow(unused)] fn main() { #[async_trait] pub trait StreamProvider: Send + Sync { async fn stream( &self, config: StreamConfig, tx: mpsc::UnboundedSender<StreamEvent>, cancel: CancellationToken, ) -> Result<Message, ProviderError>; } }
All providers receive a StreamConfig, emit StreamEvents through the channel, and return the final Message.
Anthropic Provider
AnthropicProvider implements the Anthropic Messages API with SSE streaming.
Usage
#![allow(unused)] fn main() { use yoagent::provider::AnthropicProvider; let agent = Agent::new(AnthropicProvider) .with_model("claude-sonnet-4-20250514") .with_api_key(std::env::var("ANTHROPIC_API_KEY").unwrap()); }
Features
Streaming SSE
Uses reqwest-eventsource to parse Anthropic's SSE stream. Events handled:
message_start— Input token usage, cache statscontent_block_start— Text, thinking, or tool_use blockcontent_block_delta— Text, thinking, input JSON, or signature deltascontent_block_stop— Block completemessage_delta— Stop reason, output usagemessage_stop— Stream complete
Extended Thinking
Set thinking_level to enable thinking with a token budget:
| Level | Budget Tokens |
|---|---|
Minimal | 128 |
Low | 512 |
Medium | 2,048 |
High | 8,192 |
Thinking content is streamed as Content::Thinking with a cryptographic signature for verification.
Cache Control
Automatic prompt caching via cache_control markers:
- System prompt: Always cached with
{"type": "ephemeral"} - Second-to-last message: Gets
cache_controlon its last content block, creating a cache breakpoint
This means on repeated calls, only the latest message is processed at full price.
Configuration
| Setting | Value |
|---|---|
| API URL | https://api.anthropic.com/v1/messages |
| API Version | 2023-06-01 |
| Auth Header | x-api-key |
| Default Max Tokens | 8,192 |
Environment Variables
| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY | API key |
OpenAI Compatible Provider
OpenAiCompatProvider implements the OpenAI Chat Completions API. One implementation covers OpenAI, xAI, Groq, Cerebras, OpenRouter, Mistral, DeepSeek, and any other compatible API.
Usage
Requires a ModelConfig with compat flags set in StreamConfig.model_config:
#![allow(unused)] fn main() { use yoagent::provider::{OpenAiCompatProvider, ModelConfig}; let agent = Agent::new(OpenAiCompatProvider) .with_model("gpt-4o") .with_api_key(std::env::var("OPENAI_API_KEY").unwrap()); }
OpenAiCompat Quirk Flags
Different providers have behavioral differences even though they share the same API:
#![allow(unused)] fn main() { pub struct OpenAiCompat { pub supports_store: bool, pub supports_developer_role: bool, pub supports_reasoning_effort: bool, pub supports_usage_in_streaming: bool, pub max_tokens_field: MaxTokensField, // MaxTokens or MaxCompletionTokens pub requires_tool_result_name: bool, pub requires_assistant_after_tool_result: bool, pub thinking_format: ThinkingFormat, // OpenAi, Xai, or Qwen } }
Provider Presets
| Provider | Constructor | Key Differences |
|---|---|---|
| OpenAI | OpenAiCompat::openai() | developer role, max_completion_tokens, store, reasoning_effort |
| xAI (Grok) | OpenAiCompat::xai() | reasoning field for thinking (not reasoning_content) |
| Groq | OpenAiCompat::groq() | Standard defaults |
| Cerebras | OpenAiCompat::cerebras() | Standard defaults |
| OpenRouter | OpenAiCompat::openrouter() | max_completion_tokens |
| Mistral | OpenAiCompat::mistral() | max_tokens field |
| DeepSeek | OpenAiCompat::deepseek() | max_completion_tokens |
Adding a New Compatible Provider
- Add a constructor to
OpenAiCompat:
#![allow(unused)] fn main() { impl OpenAiCompat { pub fn my_provider() -> Self { Self { supports_usage_in_streaming: true, // set flags as needed... ..Default::default() } } } }
- Create a
ModelConfigthat uses it:
#![allow(unused)] fn main() { let config = ModelConfig { id: "my-model".into(), name: "My Model".into(), api: ApiProtocol::OpenAiCompletions, provider: "my-provider".into(), base_url: "https://api.myprovider.com/v1".into(), compat: Some(OpenAiCompat::my_provider()), // ... }; }
Thinking/Reasoning
The ThinkingFormat enum controls how reasoning content is parsed from streams:
ThinkingFormat::OpenAi— Usesreasoning_contentfield (DeepSeek, default)ThinkingFormat::Xai— Usesreasoningfield (Grok)ThinkingFormat::Qwen— Usesreasoning_contentfield (Qwen)
Auth
Uses Authorization: Bearer {api_key} header. Extra headers can be added via ModelConfig.headers.
Google Gemini Provider
Two providers for Google's Gemini models:
GoogleProvider— Google AI Studio (Generative AI API)GoogleVertexProvider— Google Cloud Vertex AI
Google AI Studio
#![allow(unused)] fn main() { use yoagent::provider::GoogleProvider; let agent = Agent::new(GoogleProvider) .with_model("gemini-2.0-flash") .with_api_key(std::env::var("GOOGLE_API_KEY").unwrap()); }
API Details
- Endpoint:
{base_url}/v1beta/models/{model}:streamGenerateContent?alt=sse&key={api_key} - Auth: API key as query parameter
- Default base URL:
https://generativelanguage.googleapis.com - Default context window: 1,000,000 tokens
Message Format
Google uses a different message format than OpenAI/Anthropic:
| yoagent | Google API |
|---|---|
user role | user role |
assistant role | model role |
Content::Text | {"text": "..."} |
Content::Image | {"inlineData": {...}} |
Content::ToolCall | {"functionCall": {...}} |
Message::ToolResult | {"functionResponse": {...}} |
| System prompt | systemInstruction field |
| Tools | tools[].functionDeclarations[] |
Streaming
Uses SSE format (alt=sse). Each chunk contains candidates with content.parts and optional usageMetadata.
Google Vertex AI
GoogleVertexProvider uses the same message format but with Vertex AI authentication and endpoints.
- Protocol:
ApiProtocol::GoogleVertex - Auth: OAuth2 / service account credentials
- Endpoint pattern:
https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/publishers/google/models/{model}:streamGenerateContent
Amazon Bedrock Provider
BedrockProvider implements the AWS Bedrock ConverseStream API.
Usage
#![allow(unused)] fn main() { use yoagent::provider::BedrockProvider; let agent = Agent::new(BedrockProvider) .with_model("anthropic.claude-3-sonnet-20240229-v1:0") .with_api_key("ACCESS_KEY:SECRET_KEY"); // or ACCESS_KEY:SECRET_KEY:SESSION_TOKEN }
Authentication
The api_key field uses a colon-separated format:
{access_key_id}:{secret_access_key}
{access_key_id}:{secret_access_key}:{session_token}
Alternatively, provide pre-computed auth headers via ModelConfig.headers or use an IAM proxy that handles SigV4 signing.
API Details
- Endpoint:
{base_url}/model/{model}/converse-stream - Default base URL:
https://bedrock-runtime.us-east-1.amazonaws.com - Protocol:
ApiProtocol::BedrockConverseStream
Message Format
Bedrock uses its own content block format:
| yoagent | Bedrock API |
|---|---|
Content::Text | {"text": "..."} |
Content::Image | {"image": {"format": "...", "source": {"bytes": "..."}}} |
Content::ToolCall | {"toolUse": {"toolUseId": "...", "name": "...", "input": ...}} |
Message::ToolResult | {"toolResult": {"toolUseId": "...", "content": [...], "status": "success"}} |
| System prompt | system array of text blocks |
| Tools | toolConfig.tools[].toolSpec |
| Max tokens | inferenceConfig.maxTokens |
Stream Events
Bedrock's ConverseStream returns these event types:
contentBlockStart— New content block (text or tool use)contentBlockDelta— Text or tool use input deltacontentBlockStop— Block completemessageStop— Stop reason (end_turn,max_tokens,tool_use)metadata— Token usage
Azure OpenAI Provider
AzureOpenAiProvider implements the OpenAI Responses API format with Azure-specific authentication and URL patterns.
Usage
#![allow(unused)] fn main() { use yoagent::provider::AzureOpenAiProvider; let agent = Agent::new(AzureOpenAiProvider) .with_model("gpt-4o") .with_api_key(std::env::var("AZURE_OPENAI_API_KEY").unwrap()); }
Authentication
Uses the api-key header (not Authorization: Bearer):
api-key: {your_api_key}
Additional headers can be set via ModelConfig.headers (e.g., for Azure AD Bearer tokens).
URL Format
https://{resource}.openai.azure.com/openai/deployments/{deployment}
Set this as ModelConfig.base_url. The provider appends /responses?api-version=2025-01-01-preview.
API Details
- Protocol:
ApiProtocol::AzureOpenAiResponses - Format: OpenAI Responses API (not Chat Completions)
- Streaming: SSE with event types:
response.output_text.delta— Text contentresponse.function_call_arguments.start— Tool call startresponse.function_call_arguments.delta— Tool call argumentsresponse.completed— Final usage data
Message Format
Uses the Responses API input format:
| yoagent | Azure Responses API |
|---|---|
| User message | {"role": "user", "content": "..."} |
| Assistant text | {"type": "message", "role": "assistant", "content": [{"type": "output_text", ...}]} |
| Tool call | {"type": "function_call", "call_id": "...", "name": "...", "arguments": "..."} |
| Tool result | {"type": "function_call_output", "call_id": "...", "output": "..."} |
| System prompt | instructions field |
Built-in Tools
yoagent ships with six coding-oriented tools. Get them all with default_tools():
#![allow(unused)] fn main() { use yoagent::tools::default_tools; let tools = default_tools(); }
BashTool
Execute shell commands with timeout and output capture.
- Name:
bash - Parameters:
command(string, required)
Configuration
#![allow(unused)] fn main() { pub struct BashTool { pub cwd: Option<String>, // Working directory pub timeout: Duration, // Default: 120s pub max_output_bytes: usize, // Default: 256KB pub deny_patterns: Vec<String>, // Blocked commands pub confirm_fn: Option<ConfirmFn>, // Confirmation callback } }
Default deny patterns: rm -rf /, rm -rf /*, mkfs, dd if=, fork bomb.
Example
#![allow(unused)] fn main() { let bash = BashTool::default(); // Or customize: let bash = BashTool { cwd: Some("/workspace".into()), timeout: Duration::from_secs(60), ..Default::default() }; }
ReadFileTool
Read file contents with optional line range.
- Name:
read_file - Parameters:
path(required),offset(optional, 1-indexed line),limit(optional, number of lines)
Configuration
#![allow(unused)] fn main() { pub struct ReadFileTool { pub max_bytes: usize, // Default: 1MB pub allowed_paths: Vec<String>, // Path restrictions (empty = no restriction) } }
WriteFileTool
Write content to a file. Creates parent directories automatically.
- Name:
write_file - Parameters:
path(required),content(required)
EditFileTool
Surgical search/replace edits. The most important tool for coding agents — instead of rewriting entire files, the agent specifies exact text to find and replace.
- Name:
edit_file - Parameters:
path(required),old_text(required),new_text(required)
The old_text must match exactly, including whitespace and indentation.
ListFilesTool
List files and directories with optional glob filtering.
- Name:
list_files - Parameters:
path(optional, default:.),pattern(optional glob)
Configuration
#![allow(unused)] fn main() { pub struct ListFilesTool { pub max_results: usize, // Default: 200 pub timeout: Duration, // Default: 10s } }
Uses find or fd for efficient traversal.
SearchTool
Search files using grep (or ripgrep if available).
- Name:
search - Parameters:
pattern(required, regex),path(optional root directory)
Configuration
#![allow(unused)] fn main() { pub struct SearchTool { pub root: Option<String>, // Root directory pub max_results: usize, // Default: 50 pub timeout: Duration, // Default: 30s } }
Returns matching lines with file paths and line numbers.
Configuration
AgentLoopConfig
The main configuration for the agent loop:
#![allow(unused)] fn main() { pub struct AgentLoopConfig<'a> { pub provider: &'a dyn StreamProvider, pub model: String, pub api_key: String, pub thinking_level: ThinkingLevel, pub max_tokens: Option<u32>, pub temperature: Option<f32>, pub convert_to_llm: Option<ConvertToLlmFn>, pub transform_context: Option<TransformContextFn>, pub get_steering_messages: Option<GetMessagesFn>, pub get_follow_up_messages: Option<GetMessagesFn>, pub context_config: Option<ContextConfig>, pub execution_limits: Option<ExecutionLimits>, pub cache_config: CacheConfig, pub tool_execution: ToolExecutionStrategy, pub retry_config: RetryConfig, pub before_turn: Option<BeforeTurnFn>, pub after_turn: Option<AfterTurnFn>, pub on_error: Option<OnErrorFn>, pub input_filters: Vec<Arc<dyn InputFilter>>, } }
StreamConfig
Passed to StreamProvider::stream():
#![allow(unused)] fn main() { pub struct StreamConfig { pub model: String, pub system_prompt: String, pub messages: Vec<Message>, pub tools: Vec<ToolDefinition>, pub thinking_level: ThinkingLevel, pub api_key: String, pub max_tokens: Option<u32>, pub temperature: Option<f32>, pub model_config: Option<ModelConfig>, pub cache_config: CacheConfig, } }
ContextConfig
Controls context window compaction:
#![allow(unused)] fn main() { pub struct ContextConfig { pub max_context_tokens: usize, // Default: 100,000 pub system_prompt_tokens: usize, // Default: 4,000 pub keep_recent: usize, // Default: 10 pub keep_first: usize, // Default: 2 pub tool_output_max_lines: usize, // Default: 50 } }
ExecutionLimits
Prevents runaway agents:
#![allow(unused)] fn main() { pub struct ExecutionLimits { pub max_turns: usize, // Default: 50 pub max_total_tokens: usize, // Default: 1,000,000 pub max_duration: Duration, // Default: 600s } }
ThinkingLevel
#![allow(unused)] fn main() { pub enum ThinkingLevel { Off, // No thinking (default) Minimal, // 128 tokens (Anthropic budget) Low, // 512 tokens Medium, // 2,048 tokens High, // 8,192 tokens } }
CostConfig
Token pricing per million:
#![allow(unused)] fn main() { pub struct CostConfig { pub input_per_million: f64, pub output_per_million: f64, pub cache_read_per_million: f64, pub cache_write_per_million: f64, } }
API Reference
Top-Level Functions
agent_loop()
#![allow(unused)] fn main() { pub async fn agent_loop( prompts: Vec<AgentMessage>, context: &mut AgentContext, config: &AgentLoopConfig<'_>, tx: mpsc::UnboundedSender<AgentEvent>, cancel: CancellationToken, ) -> Vec<AgentMessage> }
Start an agent loop with new prompt messages. Returns all messages generated during the run.
agent_loop_continue()
#![allow(unused)] fn main() { pub async fn agent_loop_continue( context: &mut AgentContext, config: &AgentLoopConfig<'_>, tx: mpsc::UnboundedSender<AgentEvent>, cancel: CancellationToken, ) -> Vec<AgentMessage> }
Resume from existing context. The last message must not be an assistant message.
default_tools()
#![allow(unused)] fn main() { pub fn default_tools() -> Vec<Box<dyn AgentTool>> }
Returns: BashTool, ReadFileTool, WriteFileTool, EditFileTool, ListFilesTool, SearchTool.
Agent Struct
High-level stateful wrapper around the agent loop.
Construction
#![allow(unused)] fn main() { let agent = Agent::new(provider); }
| Signature | Description |
|---|---|
Agent::new(provider: impl StreamProvider + 'static) -> Self | Create a new agent with the given provider |
Builder Methods
All return Self for chaining.
| Method | Description |
|---|---|
with_system_prompt(prompt: impl Into<String>) -> Self | Set the system prompt |
with_model(model: impl Into<String>) -> Self | Set the model identifier |
with_api_key(key: impl Into<String>) -> Self | Set the API key |
with_thinking(level: ThinkingLevel) -> Self | Set thinking level (Off, Minimal, Low, Medium, High) |
with_tools(tools: Vec<Box<dyn AgentTool>>) -> Self | Set tools |
with_max_tokens(max: u32) -> Self | Set max output tokens |
with_context_config(config: ContextConfig) -> Self | Set context compaction config |
with_execution_limits(limits: ExecutionLimits) -> Self | Set execution limits (max turns, tokens, duration) |
without_context_management() -> Self | Disable automatic context compaction and execution limits |
async with_mcp_server_stdio(command, args, env) -> Result<Self, McpError> | Connect to MCP server via stdio and add its tools |
async with_mcp_server_http(url) -> Result<Self, McpError> | Connect to MCP server via HTTP and add its tools |
Prompting
| Method | Description |
|---|---|
async prompt(text: impl Into<String>) -> UnboundedReceiver<AgentEvent> | Send a text prompt, returns event stream |
async prompt_messages(messages: Vec<AgentMessage>) -> UnboundedReceiver<AgentEvent> | Send messages as prompt |
async continue_loop() -> UnboundedReceiver<AgentEvent> | Resume from current context (for retries) |
State Access
| Method | Description |
|---|---|
messages() -> &[AgentMessage] | Get the full message history |
is_streaming() -> bool | Whether the agent is currently running |
State Mutation
| Method | Description |
|---|---|
set_tools(tools: Vec<Box<dyn AgentTool>>) | Replace the tool set |
clear_messages() | Clear all messages |
append_message(msg: AgentMessage) | Add a message to history |
replace_messages(msgs: Vec<AgentMessage>) | Replace all messages |
Steering & Follow-Up Queues
| Method | Description |
|---|---|
steer(msg: AgentMessage) | Queue a steering message (interrupts mid-tool-execution) |
follow_up(msg: AgentMessage) | Queue a follow-up message (processed after agent finishes) |
clear_steering_queue() | Clear pending steering messages |
clear_follow_up_queue() | Clear pending follow-up messages |
clear_all_queues() | Clear both queues |
set_steering_mode(mode: QueueMode) | Set delivery mode: OneAtATime or All |
set_follow_up_mode(mode: QueueMode) | Set delivery mode: OneAtATime or All |
Control
| Method | Description |
|---|---|
abort() | Cancel the current run via CancellationToken |
reset() | Clear all state (messages, queues, streaming flag) |
Re-exports
The crate re-exports key types from lib.rs:
#![allow(unused)] fn main() { pub use agent::Agent; pub use agent_loop::{agent_loop, agent_loop_continue}; pub use types::*; // Message, Content, AgentMessage, AgentEvent, etc. }
Architecture Overview
Layered Design
yoagent is organized as three conceptual layers within a single crate. Dependencies flow strictly downward — upper layers use lower layers, never the reverse.
┌─────────────────────────────────────────────┐
│ Layer 3: Orchestration (planned) │
│ Multi-agent, delegation, work modes │
├─────────────────────────────────────────────┤
│ Layer 2: Agent + Providers │
│ Concrete providers, tools, retry, caching, │
│ context management, MCP │
├─────────────────────────────────────────────┤
│ Layer 1: Core Loop │
│ agent_loop, types, traits │
│ Provider-agnostic. Tool-agnostic. │
└─────────────────────────────────────────────┘
Layer 1: Core Loop
The pure agent loop. No opinions about LLMs, no built-in tools. Just the control flow.
Modules: types.rs, agent_loop.rs, provider/traits.rs
Owns:
agent_loop()/agent_loop_continue()— the loop itselfAgentTooltrait — interface tools must implementStreamProvidertrait — interface providers must implementAgentMessage,AgentEvent,StreamDelta— message & event typesAgentContext— system prompt + messages + tools- Tool execution strategies (parallel/sequential/batched)
- Streaming tool output (
ToolUpdateFn) - Steering & follow-up message injection
Does not own: Any concrete provider or tool implementation.
Layer 2: Agent + Providers
Batteries-included single-agent layer. Most users interact with this.
Modules: agent.rs, context.rs, retry.rs, provider/*.rs, tools/*.rs, mcp/*.rs
Adds on top of Layer 1:
- Concrete providers — Anthropic, OpenAI-compat, Google, Azure, Bedrock, Vertex
- Provider registry — dispatch by API protocol
- Prompt caching — automatic cache breakpoint placement
- Retry with backoff — exponential, jitter, respects retry-after
- Context management — token estimation, smart truncation, execution limits
- Built-in tools — bash, read_file, write_file, edit_file, list_files, search
- MCP client — stdio + HTTP transports, tool adapter
Agentstruct — stateful builder wrapping it all together
Layer 3: Orchestration (planned)
Multi-agent coordination. Not yet implemented — the architecture is designed to support it when needed.
Planned capabilities:
Orchestratorstruct — spawn, delegate, and coordinate multiple agents- Work modes:
- Interactive — multi-turn, human in the loop (current default)
- Autonomous — runs to completion without input (background tasks, CI)
- Pipeline — input → output, chainable (scan → fix → verify)
- Supervisor — delegates to other agents, synthesizes results
- Fan-out — same task to multiple agents (different providers for diversity)
- Pipeline chaining — output of agent A feeds input of agent B
- Agent communication through the orchestrator event bus
Why not yet: Multi-agent orchestration adds complexity. The single-agent loop handles 95% of use cases. Layer 3 will be built when a concrete use case drives it, not speculatively.
Module Layout
yoagent/
├── src/
│ ├── lib.rs # Public re-exports
│ │
│ │── Layer 1: Core Loop ─────────────────────
│ ├── types.rs # Message, Content, AgentTool, AgentEvent
│ ├── agent_loop.rs # Core loop: prompt → LLM → tools → repeat
│ │
│ │── Layer 2: Agent + Providers ─────────────
│ ├── agent.rs # Agent struct (stateful wrapper)
│ ├── context.rs # Token estimation, compaction, limits
│ ├── retry.rs # Retry with exponential backoff
│ ├── provider/
│ │ ├── traits.rs # StreamProvider trait, StreamEvent, ProviderError
│ │ ├── model.rs # ModelConfig, ApiProtocol, OpenAiCompat
│ │ ├── registry.rs # ProviderRegistry (protocol → provider)
│ │ ├── anthropic.rs # Anthropic Messages API
│ │ ├── openai_compat.rs # OpenAI Chat Completions (15+ providers)
│ │ ├── openai_responses.rs # OpenAI Responses API
│ │ ├── google.rs # Google Generative AI
│ │ ├── google_vertex.rs # Google Vertex AI
│ │ ├── bedrock.rs # AWS Bedrock ConverseStream
│ │ ├── azure_openai.rs # Azure OpenAI
│ │ ├── mock.rs # Mock provider for testing
│ │ └── sse.rs # SSE utilities
│ ├── tools/
│ │ ├── bash.rs # BashTool
│ │ ├── file.rs # ReadFileTool, WriteFileTool
│ │ ├── edit.rs # EditFileTool
│ │ ├── list.rs # ListFilesTool
│ │ └── search.rs # SearchTool
│ └── mcp/
│ ├── client.rs # MCP client (stdio + HTTP)
│ ├── tool_adapter.rs # McpToolAdapter (MCP tool → AgentTool)
│ ├── transport.rs # Transport implementations
│ └── types.rs # MCP protocol types
Data Flow
┌─────────────┐
│ Caller │
└──────┬──────┘
│ prompt / prompt_messages
┌──────▼──────┐
│ Agent │ Layer 2: stateful wrapper
│ (agent.rs) │ Manages queues, tools, state
└──────┬──────┘
│
┌──────▼──────┐
│ agent_loop │ Layer 1: core loop
│ │ Prompt → LLM → Tools → Repeat
└──┬───────┬──┘
│ │
┌────────▼──┐ ┌──▼────────┐
│ Provider │ │ Tools │ Layer 2: implementations
│ .stream() │ │ .execute()│
└────────┬──┘ └──┬────────┘
│ │
┌────────▼──┐ ┌──▼────────┐
│ LLM API │ │ OS / FS │
│ (HTTP) │ │ (shell) │
└───────────┘ └───────────┘
Events flow back via mpsc::UnboundedSender<AgentEvent>
How Providers Plug In
- Implement
StreamProvidertrait (Layer 1 interface) - Register with
ProviderRegistryunder anApiProtocol(Layer 2) - Set
ModelConfig.apito match that protocol - The registry dispatches
stream()calls to the right provider
Each provider translates between yoagent's Message/Content types and the provider's native API format. All providers emit StreamEvents through the channel for real-time updates.
How Tools Plug In
- Implement
AgentTooltrait (Layer 1 interface) - Add to the tools vec (via
default_tools()or custom) - The agent loop converts tools to
ToolDefinition(name, description, schema) for the LLM - When the LLM returns
Content::ToolCall, the loop finds the matching tool and callsexecute() - Results are wrapped in
Message::ToolResultand added to context
Tools receive a CancellationToken child token — they should check it for cooperative cancellation during long operations.
Design Principles
- Layers are conceptual, not physical. One crate, clean module boundaries, no feature flags needed.
- Dependencies flow down. Layer 1 never imports from Layer 2. Layer 2 never imports from Layer 3.
- Layer 1 is stable. The core loop and traits change rarely. New features are added in Layer 2 or 3.
- Build what's needed. Layer 3 is designed but not implemented. It will be built when a use case demands it, not speculatively.
- Simple over clever. A straightforward loop with good defaults beats an elegant abstraction nobody can debug.