yoagent

Simple, effective agent loop in Rust.

yoagent is a library for building LLM-powered agents that can use tools. It provides the core loop — prompt the model, execute tool calls, feed results back — and gets out of your way.

Philosophy

The loop is the product. An agent is just a loop: send messages to an LLM, get back text and tool calls, execute the tools, repeat until the model stops. yoagent implements this loop with streaming, cancellation, context management, and multi-provider support — so you don't have to.

Features

Streaming events — Real-time AgentEvent stream for UI updates (text deltas, thinking, tool execution)
Multi-provider — Anthropic, OpenAI, Google Gemini, Amazon Bedrock, Azure OpenAI, and any OpenAI-compatible API
Tool system — AgentTool trait with built-in coding tools (bash, file read/write/edit, search)
Context management — Automatic token estimation, tiered compaction (truncate tool outputs → summarize → drop old messages)
Execution limits — Max turns, tokens, and wall-clock time
Steering & follow-ups — Interrupt the agent mid-run or queue work for after it finishes
Cancellation — CancellationToken-based abort at any point
Builder pattern — Ergonomic Agent struct with chainable configuration

Ecosystem

yoagent is part of the Yolog ecosystem. It powers the agent backend for Yolog applications.

Repository: github.com/yologdev/yoagent
License: MIT

Installation

Requirements

Rust 2021 edition (1.56+, recommended 1.75+)
Tokio async runtime

Add to Cargo.toml

[dependencies]
yoagent = "0.5"

Dependencies

yoagent brings in these key dependencies automatically:

Crate	Purpose
`tokio`	Async runtime (full features)
`serde` / `serde_json`	Serialization
`reqwest`	HTTP client for provider APIs
`reqwest-eventsource`	SSE streaming
`async-trait`	Async trait support
`tokio-util`	`CancellationToken`
`thiserror`	Error types
`tracing`	Logging

Feature Flags

yoagent currently has no optional feature flags — all providers and tools are included by default.

Quick Start

Basic Example with Anthropic

use yoagent::{Agent, AgentEvent, StreamDelta};
use yoagent::provider::AnthropicProvider;
use yoagent::tools::default_tools;

#[tokio::main]
async fn main() {
    let mut agent = Agent::new(AnthropicProvider)
        .with_system_prompt("You are a helpful coding assistant.")
        .with_model("claude-sonnet-4-20250514")
        .with_api_key(std::env::var("ANTHROPIC_API_KEY").unwrap())
        .with_tools(default_tools());

    let mut rx = agent.prompt("List the files in the current directory").await;

    while let Some(event) = rx.recv().await {
        match event {
            AgentEvent::MessageUpdate { delta, .. } => match delta {
                StreamDelta::Text { delta } => print!("{}", delta),
                StreamDelta::Thinking { delta } => print!("[thinking] {}", delta),
                _ => {}
            },
            AgentEvent::ToolExecutionStart { tool_name, .. } => {
                println!("\n→ Running tool: {}", tool_name);
            }
            AgentEvent::ToolExecutionEnd { tool_name, result, is_error, .. } => {
                if is_error {
                    println!("  ✗ {} failed", tool_name);
                } else {
                    println!("  ✓ {} done", tool_name);
                }
            }
            AgentEvent::AgentEnd { .. } => {
                println!("\n\nDone.");
            }
            _ => {}
        }
    }
}

Example with OpenAI-Compatible Provider

For OpenAI, xAI, Groq, or any compatible API, use OpenAiCompatProvider with a ModelConfig:

use yoagent::{Agent, AgentEvent};
use yoagent::provider::OpenAiCompatProvider;
use yoagent::tools::default_tools;

#[tokio::main]
async fn main() {
    let mut agent = Agent::new(OpenAiCompatProvider)
        .with_system_prompt("You are a helpful assistant.")
        .with_model("gpt-4o")
        .with_api_key(std::env::var("OPENAI_API_KEY").unwrap())
        .with_tools(default_tools());

    let mut rx = agent.prompt("What is 2 + 2?").await;

    while let Some(event) = rx.recv().await {
        match event {
            AgentEvent::MessageUpdate { delta, .. } => {
                if let yoagent::StreamDelta::Text { delta } = delta {
                    print!("{}", delta);
                }
            }
            AgentEvent::AgentEnd { .. } => println!(),
            _ => {}
        }
    }
}

Using the Low-Level API

For more control, use agent_loop() directly:

use yoagent::agent_loop::{agent_loop, AgentLoopConfig};
use yoagent::provider::AnthropicProvider;
use yoagent::types::*;
use tokio::sync::mpsc;
use tokio_util::sync::CancellationToken;

#[tokio::main]
async fn main() {
    let (tx, mut rx) = mpsc::unbounded_channel();
    let cancel = CancellationToken::new();

    let mut context = AgentContext {
        system_prompt: "You are helpful.".into(),
        messages: Vec::new(),
        tools: yoagent::tools::default_tools(),
    };

    let config = AgentLoopConfig {
        provider: &AnthropicProvider,
        model: "claude-sonnet-4-20250514".into(),
        api_key: std::env::var("ANTHROPIC_API_KEY").unwrap(),
        thinking_level: ThinkingLevel::Off,
        max_tokens: None,
        temperature: None,
        convert_to_llm: None,
        transform_context: None,
        get_steering_messages: None,
        get_follow_up_messages: None,
        context_config: None,
        execution_limits: None,
        cache_config: CacheConfig::default(),
        tool_execution: ToolExecutionStrategy::default(),
        retry_config: yoagent::RetryConfig::default(),
        before_turn: None,
        after_turn: None,
        on_error: None,
        input_filters: vec![],
    };

    let prompts = vec![AgentMessage::Llm(Message::user("Hello!"))];
    let new_messages = agent_loop(prompts, &mut context, &config, tx, cancel).await;

    // Drain events
    while let Ok(event) = rx.try_recv() {
        // handle events...
    }

    println!("Got {} new messages", new_messages.len());
}

The Agent Loop

The agent loop is the core of yoagent. It implements the fundamental cycle:

User prompt → LLM call → Tool execution → LLM call → ... → Final response

How It Works

┌──────────────────────────────────────────────┐
│                  agent_loop()                │
│                                              │
│  1. Add prompts to context                   │
│  2. Emit AgentStart + TurnStart              │
│                                              │
│  ┌─────────── Inner Loop ──────────────┐     │
│  │  • Check steering messages          │     │
│  │  • Check execution limits           │     │
│  │  • Compact context (if configured)  │     │
│  │  • Stream LLM response              │     │
│  │  • Extract tool calls               │     │
│  │  • Execute tools (with steering)    │     │
│  │  • Emit TurnEnd                     │     │
│  │  • Continue if tool_calls or steer  │     │
│  └─────────────────────────────────────┘     │
│                                              │
│  3. Check follow-up messages                 │
│  4. If follow-ups exist, loop again          │
│  5. Emit AgentEnd                            │
└──────────────────────────────────────────────┘

Entry Points

`agent_loop()`

Starts a new agent run with prompt messages:

#![allow(unused)]
fn main() {
pub async fn agent_loop(
    prompts: Vec<AgentMessage>,
    context: &mut AgentContext,
    config: &AgentLoopConfig<'_>,
    tx: mpsc::UnboundedSender<AgentEvent>,
    cancel: CancellationToken,
) -> Vec<AgentMessage>
}

The prompts are added to context, then the loop runs. Returns all new messages generated during the run.

`agent_loop_continue()`

Resumes from existing context (e.g., after an error or retry):

#![allow(unused)]
fn main() {
pub async fn agent_loop_continue(
    context: &mut AgentContext,
    config: &AgentLoopConfig<'_>,
    tx: mpsc::UnboundedSender<AgentEvent>,
    cancel: CancellationToken,
) -> Vec<AgentMessage>
}

Requires that the last message in context is not an assistant message.

AgentLoopConfig

#![allow(unused)]
fn main() {
pub struct AgentLoopConfig<'a> {
    pub provider: &'a dyn StreamProvider,
    pub model: String,
    pub api_key: String,
    pub thinking_level: ThinkingLevel,
    pub max_tokens: Option<u32>,
    pub temperature: Option<f32>,
    pub convert_to_llm: Option<ConvertToLlmFn>,
    pub transform_context: Option<TransformContextFn>,
    pub get_steering_messages: Option<GetMessagesFn>,
    pub get_follow_up_messages: Option<GetMessagesFn>,
    pub context_config: Option<ContextConfig>,
    pub execution_limits: Option<ExecutionLimits>,
    pub cache_config: CacheConfig,
    pub tool_execution: ToolExecutionStrategy,
    pub retry_config: RetryConfig,
    pub before_turn: Option<BeforeTurnFn>,
    pub after_turn: Option<AfterTurnFn>,
    pub on_error: Option<OnErrorFn>,
    pub input_filters: Vec<Arc<dyn InputFilter>>,
}
}

Field	Purpose
`provider`	The `StreamProvider` implementation to use
`model`	Model identifier (e.g., `"claude-sonnet-4-20250514"`)
`api_key`	API key for the provider
`thinking_level`	`Off`, `Minimal`, `Low`, `Medium`, `High`
`convert_to_llm`	Custom `AgentMessage[] → Message[]` conversion
`transform_context`	Pre-processing hook for context pruning
`get_steering_messages`	Returns user interruptions during tool execution
`get_follow_up_messages`	Returns queued work after agent would stop
`context_config`	Token budget and compaction settings
`execution_limits`	Max turns, tokens, duration
`cache_config`	Prompt caching behavior (see Prompt Caching)
`tool_execution`	Parallel, Sequential, or Batched (see Tools)
`retry_config`	Retry behavior for transient errors (see Retry)
`before_turn`	Called before each LLM call; return `false` to abort (see Callbacks)
`after_turn`	Called after each turn with messages and usage (see Callbacks)
`on_error`	Called on `StopReason::Error` with the error string (see Callbacks)
`input_filters`	Input filters applied to user messages before the LLM call (see Tools)

Steering & Follow-Ups

Steering

Steering messages interrupt the agent between tool executions. When the agent is executing multiple tool calls from a single LLM response, steering is checked after each tool completes. If a steering message is found:

The current tool finishes normally
All remaining tool calls are skipped with is_error: true and "Skipped due to queued user message"
The steering message is injected into context
The loop continues with a new LLM call that sees the interruption

#![allow(unused)]
fn main() {
// While agent is running tools, redirect it:
agent.steer(AgentMessage::Llm(Message::user("Stop that. Instead, explain what you found.")));
}

Follow-Ups

Follow-up messages are checked after the agent would normally stop (no more tool calls, no steering). If follow-ups exist, the loop continues with them as new input — the agent doesn't need to be re-prompted.

#![allow(unused)]
fn main() {
// Queue work for after the agent finishes its current task:
agent.follow_up(AgentMessage::Llm(Message::user("Now run the tests.")));
agent.follow_up(AgentMessage::Llm(Message::user("Then commit the changes.")));
}

Queue Modes

Both queues support two delivery modes:

Mode	Behavior
`QueueMode::OneAtATime`	Delivers one message per turn (default)
`QueueMode::All`	Delivers all queued messages at once

#![allow(unused)]
fn main() {
agent.set_steering_mode(QueueMode::All);
agent.set_follow_up_mode(QueueMode::OneAtATime);
}

Queue Management

#![allow(unused)]
fn main() {
agent.clear_steering_queue();   // Drop all pending steers
agent.clear_follow_up_queue();  // Drop all pending follow-ups
agent.clear_all_queues();       // Drop everything
}

Low-Level API

When using agent_loop() directly, steering and follow-ups are provided via callback functions:

#![allow(unused)]
fn main() {
let config = AgentLoopConfig {
    get_steering_messages: Some(Box::new(|| {
        // Return Vec<AgentMessage> — checked between tool calls
        vec![]
    })),
    get_follow_up_messages: Some(Box::new(|| {
        // Return Vec<AgentMessage> — checked when agent would stop
        vec![]
    })),
    // ...
};
}

Messages & Events

Message Types

`Message`

The core LLM message type, tagged by role:

#![allow(unused)]
fn main() {
pub enum Message {
    User {
        content: Vec<Content>,
        timestamp: u64,
    },
    Assistant {
        content: Vec<Content>,
        stop_reason: StopReason,
        model: String,
        provider: String,
        usage: Usage,
        timestamp: u64,
        error_message: Option<String>,
    },
    ToolResult {
        tool_call_id: String,
        tool_name: String,
        content: Vec<Content>,
        is_error: bool,
        timestamp: u64,
    },
}
}

Create user messages easily:

#![allow(unused)]
fn main() {
let msg = Message::user("Hello, world!");
}

`AgentMessage`

Wraps Message with support for extension messages (UI-only, notifications, etc.):

#![allow(unused)]
fn main() {
pub enum AgentMessage {
    Llm(Message),
    Extension(ExtensionMessage),
}

pub struct ExtensionMessage {
    pub role: String,
    pub kind: String,
    pub data: serde_json::Value,
}
}

Create extension messages with the convenience constructor:

#![allow(unused)]
fn main() {
let ext = ExtensionMessage::new("status_update", serde_json::json!({"status": "running"}));
let msg = AgentMessage::Extension(ext);
}

The kind field categorizes the extension (e.g., "status_update", "ui_event", "notification"). Use as_llm() to extract the Message if it's an LLM message. The default convert_to_llm function filters out Extension messages before sending to the provider.

All core message types implement Serialize, Deserialize, Clone, and PartialEq, enabling state persistence and test assertions.

Content

Each message contains Vec<Content>:

#![allow(unused)]
fn main() {
pub enum Content {
    Text { text: String },
    Image { data: String, mime_type: String },
    Thinking { thinking: String, signature: Option<String> },
    ToolCall { id: String, name: String, arguments: serde_json::Value },
}
}

An assistant message can contain multiple content blocks — e.g., thinking + text + tool calls.

StopReason

#![allow(unused)]
fn main() {
pub enum StopReason {
    Stop,       // Natural completion
    Length,     // Hit max tokens
    ToolUse,    // Wants to call tools
    Error,      // Provider error
    Aborted,    // Cancelled by user
}
}

Usage

Token usage from the provider:

#![allow(unused)]
fn main() {
pub struct Usage {
    pub input: u64,
    pub output: u64,
    pub cache_read: u64,
    pub cache_write: u64,
    pub total_tokens: u64,
}
}

AgentEvent

Events emitted during the agent loop for real-time UI updates:

Event	When
`AgentStart`	Loop begins
`AgentEnd { messages }`	Loop finishes, all new messages
`TurnStart`	New LLM call starting
`TurnEnd { message, tool_results }`	LLM call + tool execution complete
`MessageStart { message }`	A message is available
`MessageUpdate { message, delta }`	Streaming delta arrived
`MessageEnd { message }`	Message finalized
`ToolExecutionStart { tool_call_id, tool_name, args }`	Tool about to run
`ToolExecutionUpdate { tool_call_id, tool_name, partial_result }`	Tool progress
`ToolExecutionEnd { tool_call_id, tool_name, result, is_error }`	Tool finished
`ProgressMessage { tool_call_id, tool_name, text }`	User-facing progress text from a tool
`InputRejected { reason }`	Input filter rejected the user's message

StreamDelta

Deltas within MessageUpdate:

#![allow(unused)]
fn main() {
pub enum StreamDelta {
    Text { delta: String },
    Thinking { delta: String },
    ToolCallDelta { delta: String },
}
}

Agent State

The Agent struct provides access to its current state:

#![allow(unused)]
fn main() {
// Check if the agent is currently streaming a response
if agent.is_streaming() {
    // Use steer() or follow_up() instead of prompt()
    agent.steer(AgentMessage::Llm(Message::user("New instruction")));
}

// Access the full message history
let messages: &[AgentMessage] = agent.messages();

// Check the last message
if let Some(last) = messages.last() {
    println!("Last message role: {}", last.role());
}
}

The is_streaming() flag is true between prompt()/continue_loop() call and completion. While streaming, calling prompt() will panic — use steer() or follow_up() instead.

Tools

The AgentTool Trait

Every tool implements AgentTool:

#![allow(unused)]
fn main() {
#[async_trait]
pub trait AgentTool: Send + Sync {
    fn name(&self) -> &str;
    fn label(&self) -> &str;
    fn description(&self) -> &str;
    fn parameters_schema(&self) -> serde_json::Value;
    async fn execute(
        &self,
        params: serde_json::Value,
        ctx: ToolContext,
    ) -> Result<ToolResult, ToolError>;
}
}

Method	Purpose
`name()`	Unique ID sent to LLM (e.g., `"bash"`)
`label()`	Human-readable name for UI (e.g., `"Run Command"`)
`description()`	Tells the LLM what the tool does
`parameters_schema()`	JSON Schema for the tool's parameters
`execute()`	Runs the tool, returns `ToolResult` or `ToolError`. Receives a `ToolContext` with cancellation, update, and progress callbacks.

ToolContext

All execution context is bundled into a single struct, making the trait easier to extend in the future:

#![allow(unused)]
fn main() {
pub struct ToolContext {
    pub tool_call_id: String,
    pub tool_name: String,
    pub cancel: CancellationToken,
    pub on_update: Option<ToolUpdateFn>,
    pub on_progress: Option<ProgressFn>,
}
}

Field	Purpose
`tool_call_id`	Unique ID for this tool call (for correlating events)
`tool_name`	Name of the tool being executed
`cancel`	Cancellation token — check `ctx.cancel.is_cancelled()` in long-running tools
`on_update`	Callback for streaming partial `ToolResult` updates to the UI (emits `ToolExecutionUpdate`)
`on_progress`	Callback for emitting user-facing progress messages (emits `ProgressMessage`)

ToolContext implements Clone and Debug.

ToolResult

#![allow(unused)]
fn main() {
pub struct ToolResult {
    pub content: Vec<Content>,
    pub details: serde_json::Value,
}
}

The content is sent back to the LLM. The details field holds metadata (not sent to the LLM) for UI/logging.

ToolError

#![allow(unused)]
fn main() {
pub enum ToolError {
    Failed(String),
    NotFound(String),
    InvalidArgs(String),
    Cancelled,
}
}

Errors are converted to ToolResult with is_error: true and sent back to the LLM so it can recover.

Implementing a Custom Tool

#![allow(unused)]
fn main() {
use yoagent::types::*;
use async_trait::async_trait;

pub struct WeatherTool;

#[async_trait]
impl AgentTool for WeatherTool {
    fn name(&self) -> &str { "get_weather" }
    fn label(&self) -> &str { "Weather" }
    fn description(&self) -> &str {
        "Get current weather for a city."
    }

    fn parameters_schema(&self) -> serde_json::Value {
        serde_json::json!({
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name"
                }
            },
            "required": ["city"]
        })
    }

    async fn execute(
        &self,
        params: serde_json::Value,
        _ctx: ToolContext,
    ) -> Result<ToolResult, ToolError> {
        let city = params["city"].as_str()
            .ok_or(ToolError::InvalidArgs("missing city".into()))?;

        // Call weather API...
        Ok(ToolResult {
            content: vec![Content::Text {
                text: format!("Weather in {}: 72°F, sunny", city),
            }],
            details: serde_json::Value::Null,
        })
    }
}
}

#![allow(unused)]
fn main() {
use yoagent::tools::default_tools;

let mut tools = default_tools();
tools.push(Box::new(WeatherTool));
let agent = Agent::new(provider).with_tools(tools);
}

Error Handling

Return Err(ToolError) on failure, not Ok with error text. When a tool returns Err, the agent loop converts it to a Message::ToolResult with is_error: true and sends it to the LLM. The LLM sees the error and can self-correct — retry with different arguments, try a different approach, or explain the failure to the user.

#![allow(unused)]
fn main() {
async fn execute(&self, params: serde_json::Value, _ctx: ToolContext) -> Result<ToolResult, ToolError> {
    let path = params["path"].as_str()
        .ok_or(ToolError::InvalidArgs("missing 'path'".into()))?;

    let content = std::fs::read_to_string(path)
        .map_err(|e| ToolError::Failed(format!("Cannot read {}: {}", path, e)))?;

    Ok(ToolResult {
        content: vec![Content::Text { text: content }],
        details: serde_json::Value::Null,
    })
}
}

Exception: BashTool. The built-in BashTool returns Ok even on non-zero exit codes, with both stdout and stderr in the result. This is intentional — the LLM needs to see the actual error output (compilation errors, test failures, etc.) to diagnose and fix issues. Only truly exceptional failures (e.g., command not found, cancellation) return Err.

Tool Execution Flow

LLM returns Content::ToolCall blocks in its response
Agent loop emits ToolExecutionStart for each
Tool's execute() is called with parsed arguments
Result (or error) is wrapped in Message::ToolResult
ToolExecutionEnd is emitted
All tool results are added to context
Loop continues with another LLM call

Streaming Tool Output

Long-running tools can stream progress updates to the UI via the on_update callback. Each call emits a ToolExecutionUpdate event. Partial results are for UI/logging only — they are not sent to the LLM. Only the final ToolResult returned from execute() becomes part of the conversation.

The `ToolUpdateFn` type

#![allow(unused)]
fn main() {
pub type ToolUpdateFn = Arc<dyn Fn(ToolResult) + Send + Sync>;
}

Basic usage

Call on_update whenever you have progress to report:

#![allow(unused)]
fn main() {
use yoagent::types::*;

struct DataProcessorTool;

#[async_trait]
impl AgentTool for DataProcessorTool {
    // ... name, label, description, parameters_schema ...

    async fn execute(
        &self,
        params: serde_json::Value,
        ctx: ToolContext,
    ) -> Result<ToolResult, ToolError> {
        let rows = fetch_rows(&params)?;
        let total = rows.len();

        for (i, row) in rows.iter().enumerate() {
            // Check for cancellation
            if ctx.cancel.is_cancelled() {
                return Err(ToolError::Cancelled);
            }

            process_row(row);

            // Stream progress every 100 rows
            if i % 100 == 0 {
                if let Some(ref cb) = &ctx.on_update {
                    cb(ToolResult {
                        content: vec![Content::Text {
                            text: format!("Processed {}/{} rows", i, total),
                        }],
                        details: serde_json::json!({"progress": i as f64 / total as f64}),
                    });
                }
            }
        }

        Ok(ToolResult {
            content: vec![Content::Text {
                text: format!("Processed all {} rows", total),
            }],
            details: serde_json::Value::Null,
        })
    }
}
}

Consuming updates in your UI

Updates arrive as AgentEvent::ToolExecutionUpdate events on the same event stream as all other agent events:

#![allow(unused)]
fn main() {
while let Some(event) = rx.recv().await {
    match event {
        AgentEvent::ToolExecutionStart { tool_name, .. } => {
            println!("⏳ {} started", tool_name);
        }
        AgentEvent::ToolExecutionUpdate { tool_name, partial_result, .. } => {
            // Show progress in your UI
            if let Some(Content::Text { text }) = partial_result.content.first() {
                println!("  📊 {}: {}", tool_name, text);
            }
        }
        AgentEvent::ToolExecutionEnd { tool_name, is_error, .. } => {
            println!("{} {}", if is_error { "❌" } else { "✅" }, tool_name);
        }
        AgentEvent::ProgressMessage { tool_name, text, .. } => {
            println!("  💬 {}: {}", tool_name, text);
        }
        _ => {}
    }
}
}

Progress Messages

In addition to on_update (which streams partial ToolResult values), tools can emit lightweight text-only progress messages via ctx.on_progress. These appear as AgentEvent::ProgressMessage events:

#![allow(unused)]
fn main() {
async fn execute(&self, params: serde_json::Value, ctx: ToolContext) -> Result<ToolResult, ToolError> {
    if let Some(ref progress) = &ctx.on_progress {
        progress("Starting analysis...".into());
    }

    // ... do work ...

    if let Some(ref progress) = &ctx.on_progress {
        progress("Almost done...".into());
    }

    Ok(ToolResult { /* ... */ })
}
}

Use on_progress for simple status text. Use on_update when you need structured data (progress percentages, partial results).

Guidelines

Call on_update as often as useful — there's no rate limit. The callback is synchronous and cheap.
Always check ctx.on_update.is_some() before building the ToolResult. If None, the loop isn't interested in updates (e.g., testing).
Use details for structured data — content is for human-readable text, details can carry progress percentages, byte counts, etc.
Don't rely on updates reaching the LLM — they won't. Only the final return value is added to context.
Simple tools don't need it — if your tool completes in <1 second, just ignore ctx (prefix with _ctx to suppress the warning).

End-to-end example

Here's a complete example: a CLI agent with a deploy tool that streams progress. The human sees real-time output while the LLM only gets the final result.

use yoagent::agent::Agent;
use yoagent::provider::AnthropicProvider;
use yoagent::types::*;

/// A tool that deploys an app and streams each step.
struct DeployTool;

#[async_trait]
impl AgentTool for DeployTool {
    fn name(&self) -> &str { "deploy" }
    fn label(&self) -> &str { "Deploy App" }
    fn description(&self) -> &str { "Deploy the application to production." }
    fn parameters_schema(&self) -> serde_json::Value {
        serde_json::json!({
            "type": "object",
            "properties": {
                "env": { "type": "string", "description": "Target environment" }
            },
            "required": ["env"]
        })
    }

    async fn execute(
        &self,
        params: serde_json::Value,
        ctx: ToolContext,
    ) -> Result<ToolResult, ToolError> {
        let env = params["env"].as_str().unwrap_or("staging");

        let steps = ["Building image", "Running tests", "Pushing to registry", "Rolling out"];
        for (i, step) in steps.iter().enumerate() {
            if ctx.cancel.is_cancelled() {
                return Err(ToolError::Cancelled);
            }

            // Stream each step to the UI
            if let Some(ref cb) = &ctx.on_update {
                cb(ToolResult {
                    content: vec![Content::Text {
                        text: format!("[{}/{}] {}...", i + 1, steps.len(), step),
                    }],
                    details: serde_json::json!({
                        "step": i + 1,
                        "total": steps.len(),
                        "phase": step,
                    }),
                });
            }

            // Simulate work
            tokio::time::sleep(std::time::Duration::from_secs(2)).await;
        }

        // Only this final result is sent to the LLM
        Ok(ToolResult {
            content: vec![Content::Text {
                text: format!("Successfully deployed to {}", env),
            }],
            details: serde_json::json!({"env": env, "status": "success"}),
        })
    }
}

#[tokio::main]
async fn main() {
    let mut agent = Agent::new(AnthropicProvider)
        .with_system_prompt("You are a deployment assistant.")
        .with_model("claude-sonnet-4-20250514")
        .with_api_key(std::env::var("ANTHROPIC_API_KEY").unwrap())
        .with_tools(vec![Box::new(DeployTool)]);

    let mut rx = agent.prompt("Deploy to production").await;

    while let Some(event) = rx.recv().await {
        match event {
            // LLM text streaming
            AgentEvent::MessageUpdate {
                delta: StreamDelta::Text { delta }, ..
            } => print!("{}", delta),

            // Tool progress streaming
            AgentEvent::ToolExecutionStart { tool_name, .. } => {
                println!("\n🚀 Starting {}...", tool_name);
            }
            AgentEvent::ToolExecutionUpdate { partial_result, .. } => {
                if let Some(Content::Text { text }) = partial_result.content.first() {
                    println!("  {}", text);
                }
            }
            AgentEvent::ToolExecutionEnd { tool_name, is_error, .. } => {
                if is_error {
                    println!("  ❌ {} failed", tool_name);
                } else {
                    println!("  ✅ {} complete", tool_name);
                }
            }
            AgentEvent::ProgressMessage { text, .. } => {
                println!("  💬 {}", text);
            }

            AgentEvent::AgentEnd { .. } => break,
            _ => {}
        }
    }
}

Running this produces:

🚀 Starting deploy...
  [1/4] Building image...
  [2/4] Running tests...
  [3/4] Pushing to registry...
  [4/4] Rolling out...
  ✅ deploy complete
Successfully deployed to production. The deployment completed all 4 stages.

The human sees each step as it happens. The LLM only sees "Successfully deployed to production" and can continue the conversation from there.

How agents benefit

When an AI agent (like a coding assistant) uses yoagent, streaming tool output helps in two ways:

Human oversight — The human watching the agent work sees real-time progress instead of waiting for a tool to finish. A bash command running cargo build can stream compiler output as it happens, so the human can interrupt early if something is wrong.
Agent UIs — Tools like web dashboards, IDE extensions, or chat interfaces can render live progress bars, log tails, or status indicators. The details field in ToolResult carries structured data (progress percentage, byte counts, etc.) that UIs can render however they want.

The LLM itself doesn't see updates — it works with final results only. This is intentional: partial output would waste context tokens and confuse the model. The streaming is purely a human-facing feature.

Execution Strategies

When the LLM returns multiple tool calls in a single response (e.g., "read file A, read file B, run bash C"), ToolExecutionStrategy controls how they run:

Strategy	Behavior
`Sequential`	One at a time. Steering checked between each tool. Use for debugging or tools with shared mutable state.
`Parallel` (default)	All tool calls run concurrently via `futures::join_all`. Steering checked after all complete. Best latency for independent tools.
`Batched { size }`	Run in groups of N. Steering checked between batches. Balances speed with human-in-the-loop control.

Configuration

#![allow(unused)]
fn main() {
use yoagent::agent::Agent;
use yoagent::types::ToolExecutionStrategy;

// Default — parallel (fastest)
let agent = Agent::new(provider);

// Sequential (debug / shared state)
let agent = Agent::new(provider)
    .with_tool_execution(ToolExecutionStrategy::Sequential);

// Batched — 3 at a time
let agent = Agent::new(provider)
    .with_tool_execution(ToolExecutionStrategy::Batched { size: 3 });
}

When to use each

Parallel (default): Most tool calls are independent — file reads, searches, API calls. Running them concurrently can cut latency dramatically (3 tools × 50ms = ~50ms instead of ~150ms).
Sequential: When tools have side effects that depend on order, or when you need fine-grained steering control between each tool.
Batched: When you want parallelism but also want steering checkpoints. For example, Batched { size: 3 } runs 3 tools concurrently, checks for user interrupts, then runs the next 3.

Steering messages are always checked between execution units (between each tool in Sequential, after all tools in Parallel, between batches in Batched). If a user interrupts, remaining tools are skipped.

Context Management

Long-running agents accumulate messages that exceed the model's context window. yoagent provides token tracking, overflow detection, tiered compaction, and execution limits.

Token Estimation

Fast estimation without external tokenizer dependencies:

#![allow(unused)]
fn main() {
use yoagent::context::{estimate_tokens, message_tokens, total_tokens};

estimate_tokens("Hello world");          // ~3 tokens (chars / 4)
message_tokens(&agent_message);          // estimate for a single message
total_tokens(&messages);                 // estimate for all messages
}

Context Tracking

ContextTracker combines real token counts from provider responses with estimation for new messages — more accurate than pure estimation:

#![allow(unused)]
fn main() {
use yoagent::context::ContextTracker;

let mut tracker = ContextTracker::new();

// After each assistant response, record the real usage:
tracker.record_usage(&assistant_usage, message_index);

// Get current context size (real usage + estimated trailing):
let tokens = tracker.estimate_context_tokens(agent.messages());

// After compaction, reset the tracker:
tracker.reset();
}

When no usage data is available, it falls back to chars/4 estimation.

Context Overflow Detection

When the context exceeds a model's window, providers return overflow errors. yoagent detects these automatically across all major providers.

HTTP-level detection

Providers that check before streaming (Google, Bedrock, Vertex) return ProviderError::ContextOverflow:

#![allow(unused)]
fn main() {
use yoagent::provider::ProviderError;

match agent.prompt("...").await {
    // The loop already handles this — but you can also match it:
    Err(ProviderError::ContextOverflow { message }) => {
        // Compact and retry
    }
    _ => {}
}
}

ProviderError::classify() auto-detects overflow from error messages covering Anthropic, OpenAI, Google, AWS Bedrock, xAI, Groq, OpenRouter, llama.cpp, LM Studio, MiniMax, Kimi, GitHub Copilot, and generic patterns.

Message-level detection

SSE-based providers (Anthropic, OpenAI) return overflow as a StopReason::Error message. Check with:

#![allow(unused)]
fn main() {
if message.is_context_overflow() {
    // Compact and retry
}
}

Handling overflow in your application

yoagent provides the detection and building blocks. Your application wires the compaction strategy:

#![allow(unused)]
fn main() {
// Proactive: check before each prompt
let tokens = tracker.estimate_context_tokens(agent.messages());
if tokens > context_window - reserve {
    let compacted = compact_messages(agent.messages().to_vec(), &config);
    agent.replace_messages(compacted);
}

// Reactive: catch overflow errors
// ... on ContextOverflow or message.is_context_overflow():
//   compact, then retry with agent.continue_loop()
}

For LLM-based summarization (asking the model to summarize old messages), implement that in your application layer — yoagent provides replace_messages() and compact_messages() as building blocks.

ContextConfig

#![allow(unused)]
fn main() {
pub struct ContextConfig {
    pub max_context_tokens: usize,      // Default: 100,000
    pub system_prompt_tokens: usize,    // Default: 4,000
    pub keep_recent: usize,             // Default: 10
    pub keep_first: usize,              // Default: 2
    pub tool_output_max_lines: usize,   // Default: 50
}
}

Tiered Compaction

compact_messages() tries each level in order, stopping as soon as messages fit the budget:

Level 1: Truncate Tool Outputs

Replaces long tool outputs with head + tail (keeping first N/2 and last N/2 lines). This is the cheapest — preserves conversation structure, typically saves 50-70% in coding sessions.

Level 2: Summarize Old Turns

Keeps the last keep_recent messages in full detail. Older assistant messages are replaced with one-line summaries like "[Summary] [Assistant used 3 tool(s)]", and their tool results are dropped.

Level 3: Drop Middle Messages

Keeps keep_first messages from the start and keep_recent from the end, dropping everything in between. A marker message notes how many were removed.

ExecutionLimits

Prevents runaway agents:

#![allow(unused)]
fn main() {
pub struct ExecutionLimits {
    pub max_turns: usize,              // Default: 50
    pub max_total_tokens: usize,       // Default: 1,000,000
    pub max_duration: Duration,        // Default: 600s (10 min)
}
}

When a limit is reached, the agent stops with a message like "[Agent stopped: Max turns reached (50/50)]".

Disabling Context Management

#![allow(unused)]
fn main() {
let agent = Agent::new(provider)
    .without_context_management();
}

This sets both context_config and execution_limits to None.

Prompt Caching

yoagent automatically optimizes API costs through prompt caching. For providers that support it, stable content (system prompts, tool definitions, conversation history) is cached between turns, giving you up to 90% savings on input tokens.

How It Works

In a multi-turn agent loop, each request sends the full context: system prompt + tools + conversation history. Without caching, you pay full price for all of it every turn. With caching, the provider reuses previously processed prefixes.

Provider Support

Provider	Caching Type	Savings	Framework Action
Anthropic	Explicit (cache breakpoints)	90% on hits	✅ Auto-placed
OpenAI	Automatic (>1024 tokens)	50% on hits	None needed
Google Gemini	Implicit (automatic)	Varies	None needed
Azure OpenAI	Automatic (same as OpenAI)	50% on hits	None needed
Amazon Bedrock	Automatic (where supported)	Varies	None needed

What Gets Cached (Anthropic)

yoagent places up to 3 cache breakpoints automatically:

System prompt — stable across all turns
Tool definitions — rarely change between turns
Conversation history — second-to-last message, so the growing prefix is cached

This means on a typical multi-turn conversation, only the latest user message and the new assistant response cost full price.

Configuration

Caching is enabled by default with automatic breakpoint placement. No configuration needed for optimal behavior.

Disable Caching

#![allow(unused)]
fn main() {
use yoagent::{CacheConfig, CacheStrategy};

let agent = Agent::new(provider)
    .with_cache_config(CacheConfig {
        enabled: false,
        ..Default::default()
    });
}

Fine-Grained Control

#![allow(unused)]
fn main() {
let agent = Agent::new(provider)
    .with_cache_config(CacheConfig {
        enabled: true,
        strategy: CacheStrategy::Manual {
            cache_system: true,
            cache_tools: true,
            cache_messages: false, // Don't cache conversation history
        },
    });
}

Monitoring Cache Usage

Every Usage struct includes cache statistics:

#![allow(unused)]
fn main() {
// After a response:
let usage = message.usage(); // from assistant message
println!("Cache read: {} tokens", usage.cache_read);
println!("Cache write: {} tokens", usage.cache_write);
println!("Cache hit rate: {:.1}%", usage.cache_hit_rate() * 100.0);
}

cache_read — tokens served from cache (cheap)
cache_write — tokens written to cache (slightly more than base price)
cache_hit_rate() — fraction of input tokens from cache (0.0–1.0)

Cost Impact

For a typical 10-turn agent conversation with Anthropic Claude:

Without Caching	With Caching (auto)
~500K input tokens billed at full price	~50K at full price + ~450K at 10% price
$2.50 (Sonnet)	$0.39 (Sonnet)

That's an 84% cost reduction with zero configuration.

Best Practices

Keep system prompts stable — changing the system prompt between turns invalidates the cache
Don't shuffle tools — tool order matters for cache prefix matching
Let it work automatically — the default CacheStrategy::Auto is optimal for most use cases
Monitor cache_hit_rate() — if it's consistently low, check if your system prompt or tools are changing unexpectedly

Retry with Backoff

When an LLM provider returns a transient error — rate limit (HTTP 429) or network failure — yoagent automatically retries with exponential backoff and jitter. No configuration required; it works out of the box.

How it works

Request → Error? → Retryable? → Wait (backoff + jitter) → Retry → ...
                       ↓ No
                  Fail immediately

The agent loop calls the provider
If the provider returns a retryable error:
- If a retry-after delay was provided (rate limits), use that
- Otherwise, calculate delay: initial_delay × multiplier^(attempt-1) with ±20% jitter
- Wait, then retry
After max_retries attempts, the error propagates normally

What gets retried

Error Type	Retried?	Why
`RateLimited` (429)	✅ Yes	Temporary — provider will accept requests again soon
`Network`	✅ Yes	Transient — connection resets, timeouts, DNS failures
`Auth` (401/403)	❌ No	Permanent — wrong API key won't fix itself
`Api` (400, etc.)	❌ No	Permanent — bad request won't change on retry
`Cancelled`	❌ No	User-initiated — respect the cancellation

Default configuration

#![allow(unused)]
fn main() {
RetryConfig {
    max_retries: 3,          // Up to 3 retry attempts
    initial_delay_ms: 1000,  // 1 second before first retry
    backoff_multiplier: 2.0, // Double the delay each attempt
    max_delay_ms: 30_000,    // Cap at 30 seconds
}
}

With defaults, the retry delays are approximately:

Attempt 1: ~1s
Attempt 2: ~2s
Attempt 3: ~4s

(±20% jitter to avoid thundering herd when multiple agents hit the same provider)

Configuration

Using the Agent builder

#![allow(unused)]
fn main() {
use yoagent::agent::Agent;
use yoagent::retry::RetryConfig;

// Default — 3 retries, exponential backoff (recommended)
let agent = Agent::new(provider);

// Custom — more retries, longer initial delay
let agent = Agent::new(provider)
    .with_retry_config(RetryConfig {
        max_retries: 5,
        initial_delay_ms: 2000,
        backoff_multiplier: 2.0,
        max_delay_ms: 60_000,
    });

// Disable retries entirely
let agent = Agent::new(provider)
    .with_retry_config(RetryConfig::none());
}

Using AgentLoopConfig directly

#![allow(unused)]
fn main() {
use yoagent::agent_loop::AgentLoopConfig;
use yoagent::retry::RetryConfig;

let config = AgentLoopConfig {
    // ...other fields...
    retry_config: RetryConfig {
        max_retries: 3,
        initial_delay_ms: 1000,
        backoff_multiplier: 2.0,
        max_delay_ms: 30_000,
    },
};
}

Rate limit headers

When a provider returns ProviderError::RateLimited { retry_after_ms: Some(5000) }, yoagent uses that exact delay instead of the calculated backoff. This respects the provider's guidance — if Anthropic says "retry after 5 seconds", we wait 5 seconds, not our own estimate.

If no retry_after_ms is provided, the exponential backoff kicks in.

Observability

Retry attempts are logged via tracing at the WARN level:

WARN Provider error (attempt 1/3), retrying in 1.1s: Rate limited, retry after 1000ms
WARN Provider error (attempt 2/3), retrying in 2.3s: Rate limited, retry after 2000ms

Subscribe to tracing events in your application to surface these in your UI:

#![allow(unused)]
fn main() {
use tracing_subscriber;

// Simple stderr logging
tracing_subscriber::fmt::init();

// Or filter to just retries
tracing_subscriber::fmt()
    .with_env_filter("yoagent::retry=warn")
    .init();
}

Design notes

Retry lives in the agent loop, not inside individual providers. One config controls all retry behavior.
Jitter prevents thundering herd: when many agents hit a rate limit simultaneously, jitter spreads their retries so they don't all retry at the same instant.
Cancellation is respected: if the user cancels while waiting for a retry, the loop exits immediately.
No retry on API errors: a malformed request will fail the same way every time. Retrying wastes time and tokens.

Skills

Skills extend an agent with domain expertise using the AgentSkills open standard. A skill is a directory containing a SKILL.md file with instructions the agent can load on demand.

How it works

Skills use progressive disclosure to manage context efficiently:

Metadata (~100 tokens/skill) — name + description, always in the system prompt
Instructions (<5k tokens) — SKILL.md body, loaded when the agent decides the skill is relevant
Resources (unlimited) — scripts, references, assets — loaded only when needed

The agent decides when to activate a skill based on the description alone. No trigger engine needed.

Skill format

my-skill/
├── SKILL.md          # Required: YAML frontmatter + instructions
├── scripts/          # Optional: executable code
├── references/       # Optional: documentation loaded on demand
└── assets/           # Optional: templates, static resources

SKILL.md uses YAML frontmatter:

---
name: git
description: Git operations — commit, branch, merge, rebase. Use when the user mentions version control.
---

# Git Skill

## Workflow
1. Run `git status` first
2. Stage changes, write conventional commit messages
3. For merges, check for conflicts first

## Scripts
For complex diffs: `bash {baseDir}/scripts/diff_summary.sh`

Loading skills

#![allow(unused)]
fn main() {
use yoagent::SkillSet;

// Load from multiple directories (later dirs override earlier on name conflict)
let skills = SkillSet::load(&["./skills", "~/.yoagent/skills"])?;

// Or load from a single directory with a label
let workspace_skills = SkillSet::load_dir("./skills", "workspace")?;
}

Using with Agent

#![allow(unused)]
fn main() {
use yoagent::{Agent, SkillSet};

let skills = SkillSet::load(&["./skills"])?;

let agent = Agent::new(provider)
    .with_system_prompt("You are a coding assistant.")
    .with_skills(skills)  // Appends skill index to system prompt
    .with_tools(tools);
}

The agent's system prompt will include:

<available_skills>
  <skill>
    <name>git</name>
    <description>Git operations — commit, branch, merge, rebase.</description>
    <location>/path/to/skills/git/SKILL.md</location>
  </skill>
</available_skills>

When the agent encounters a task matching a skill, it reads the SKILL.md using the read_file tool and follows the instructions. No special infrastructure needed.

Precedence

When loading from multiple directories, later directories take precedence. A skill in ./skills/ overrides the same-named skill in ~/.yoagent/skills/.

You can also merge skill sets explicitly:

#![allow(unused)]
fn main() {
let mut base = SkillSet::load_dir("/usr/share/yoagent/skills", "bundled")?;
let user = SkillSet::load_dir("~/.yoagent/skills", "user")?;
let workspace = SkillSet::load_dir("./skills", "workspace")?;

base.merge(user);
base.merge(workspace); // workspace wins on conflict
}

Compatibility

By following the AgentSkills standard, skills written for yoagent work with Claude Code, Codex CLI, Gemini CLI, Cursor, OpenCode, Goose, and any other compatible agent. Write once, use everywhere.

Design philosophy

Skills are deliberately simple:

No trigger engine — the LLM decides from descriptions
No compile-time registration — skills use existing tools (read_file, bash)
No plugin API — skills are just files
No runtime loading — loaded at startup, that's it

If a skill needs a custom tool, it can provide an MCP server.

Sub-Agents

Sub-agents let a parent agent delegate tasks to child agent loops, each with their own system prompt, tools, and provider. The parent LLM invokes them like any other tool.

Overview

Parent Agent
├── prompt("Research X and implement Y")
│   ├── calls SubAgentTool("researcher", task="Research X")
│   │   └── child agent_loop() with read/search tools → returns findings
│   ├── calls SubAgentTool("coder", task="Implement Y based on findings")
│   │   └── child agent_loop() with edit/write tools → returns result
│   └── summarizes both results

Each sub-agent invocation starts a fresh conversation — no state leaks between calls.

Creating Sub-Agents

#![allow(unused)]
fn main() {
use std::sync::Arc;
use yoagent::sub_agent::SubAgentTool;
use yoagent::provider::AnthropicProvider;
use yoagent::tools;

let researcher = SubAgentTool::new("researcher", Arc::new(AnthropicProvider))
    .with_description("Searches and reads files to gather information.")
    .with_system_prompt("You are a research assistant. Be thorough and concise.")
    .with_model("claude-sonnet-4-20250514")
    .with_api_key(&api_key)
    .with_tools(vec![
        Arc::new(tools::ReadFileTool::new()),
        Arc::new(tools::SearchTool::new()),
    ])
    .with_max_turns(10);
}

Registering on a Parent Agent

#![allow(unused)]
fn main() {
use yoagent::agent::Agent;

let mut agent = Agent::new(AnthropicProvider)
    .with_system_prompt("You coordinate between sub-agents.")
    .with_model("claude-sonnet-4-20250514")
    .with_api_key(api_key)
    .with_sub_agent(researcher)
    .with_sub_agent(coder);
}

The parent sees sub-agents as regular tools. It decides when to delegate based on its system prompt.

Parallel Execution

When the parent LLM calls multiple sub-agents in a single response, they run concurrently (default Parallel strategy). Two sub-agents each taking 50ms complete in ~50ms total, not 100ms.

Configuration

Method	Purpose
`with_description()`	What the parent LLM sees (helps it decide when to delegate)
`with_system_prompt()`	The sub-agent's own instructions
`with_model()` / `with_api_key()`	Can use a different model than the parent
`with_tools()`	Tools available to the sub-agent (accepts `Vec<Arc<dyn AgentTool>>`)
`with_max_turns(N)`	Turn limit (default: 10). Primary guard against runaway execution.
`with_thinking()`	Enable extended thinking for the sub-agent
`with_cache_config()`	Prompt caching settings

Event Forwarding

When the parent provides an on_update callback (standard for all tools), sub-agent events are forwarded as ToolExecutionUpdate events. The parent's UI sees real-time progress from the child:

Text deltas from the sub-agent's LLM responses
Tool call notifications from the sub-agent's tool usage

Design Decisions

Context isolation: Each invocation starts fresh. Sub-agents don't accumulate history across calls.
No nesting: Sub-agents are not given other SubAgentTools. This prevents infinite delegation chains.
Cancellation propagation: The parent's cancellation token is forwarded. Aborting the parent aborts all sub-agents.
Turn limiting: The default 10-turn limit prevents runaway execution. The parent's execution limits also apply to total wall-clock time.

Example

See examples/sub_agent.rs for a complete coordinator with researcher and coder sub-agents.

State Persistence

yoagent supports saving and restoring agent conversation state, enabling pause/resume workflows, state transfer between processes, and conversation checkpointing.

Save and Restore

#![allow(unused)]
fn main() {
use yoagent::agent::Agent;

// After running some conversation turns...
let json = agent.save_messages()?;
std::fs::write("conversation.json", &json)?;

// Later, in a new process:
let json = std::fs::read_to_string("conversation.json")?;
let mut agent = Agent::new(provider)
    .with_system_prompt("You are helpful.")
    .with_model("claude-sonnet-4-20250514")
    .with_api_key(api_key);

agent.restore_messages(&json)?;

// Continue the conversation — the agent sees the full history
let rx = agent.prompt("Follow up question").await;
}

Builder Initialization

For constructing an agent with pre-existing history:

#![allow(unused)]
fn main() {
let saved: Vec<AgentMessage> = serde_json::from_str(&json)?;
let agent = Agent::new(provider)
    .with_messages(saved)
    .with_system_prompt("...")
    .with_model("...");
}

JSON Format

Messages serialize as a JSON array. Each message is tagged by role:

[
  {
    "role": "user",
    "content": [{"type": "text", "text": "Hello"}],
    "timestamp": 1700000000000
  },
  {
    "role": "assistant",
    "content": [{"type": "text", "text": "Hi there!"}],
    "stopReason": "stop",
    "model": "claude-sonnet-4-20250514",
    "provider": "anthropic",
    "usage": {"input": 100, "output": 50, "cache_read": 0, "cache_write": 0, "total_tokens": 150},
    "timestamp": 1700000001000
  }
]

Extension messages use a nested structure:

{
  "role": "extension",
  "kind": "status_update",
  "data": {"status": "running"}
}

Context Tracking

ContextTracker and ExecutionTracker are runtime-only and not persisted. This is by design — both are created fresh each agent_loop() invocation and operate on whatever messages are in context at that point. Restoring messages and calling prompt() works correctly without any special recalculation.

What's Serializable

Type	Serialize	Deserialize	PartialEq
`Content`	Yes	Yes	Yes
`Message`	Yes	Yes	Yes
`AgentMessage`	Yes	Yes	Yes
`ExtensionMessage`	Yes	Yes	Yes
`Usage`	Yes	Yes	Yes
`StopReason`	Yes	Yes	Yes
`ToolResult`	Yes	Yes	Yes
`CacheConfig`	Yes	Yes	Yes
`ToolExecutionStrategy`	Yes	Yes	Yes
`ContextConfig`	Yes	Yes	No
`ExecutionLimits`	Yes	Yes	No

Lifecycle Callbacks

yoagent provides three lifecycle callbacks that let you observe and control the agent loop without modifying its internals.

Callbacks

`before_turn`

Called before each LLM call. Receives the current message history and the turn number (0-indexed). Return false to abort the loop.

#![allow(unused)]
fn main() {
let agent = Agent::new(provider)
    .on_before_turn(|messages, turn| {
        println!("Turn {} starting with {} messages", turn, messages.len());
        turn < 10 // Stop after 10 turns
    });
}

`after_turn`

Called after each LLM response and tool execution. Receives the updated message history and the turn's token usage.

#![allow(unused)]
fn main() {
use std::sync::{Arc, Mutex};

let total_cost = Arc::new(Mutex::new(0u64));
let cost_tracker = total_cost.clone();

let agent = Agent::new(provider)
    .on_after_turn(move |_messages, usage| {
        let mut cost = cost_tracker.lock().unwrap();
        *cost += usage.input + usage.output;
        println!("Cumulative tokens: {}", *cost);
    });
}

`on_error`

Called when the LLM returns a StopReason::Error. Receives the error message string.

#![allow(unused)]
fn main() {
let agent = Agent::new(provider)
    .on_error(|err| {
        eprintln!("LLM error: {}", err);
        // Log to monitoring, send alert, etc.
    });
}

Combining Callbacks

All callbacks are optional and independent:

#![allow(unused)]
fn main() {
let agent = Agent::new(provider)
    .on_before_turn(|_msgs, turn| turn < 20)
    .on_after_turn(|msgs, usage| {
        println!("Messages: {}, Tokens: {}/{}", msgs.len(), usage.input, usage.output);
    })
    .on_error(|err| eprintln!("Error: {}", err));
}

Using with `AgentLoopConfig`

For direct loop usage without the Agent wrapper:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use yoagent::agent_loop::AgentLoopConfig;

let config = AgentLoopConfig {
    before_turn: Some(Arc::new(|_msgs, turn| turn < 5)),
    after_turn: Some(Arc::new(|_msgs, _usage| { /* log */ })),
    on_error: Some(Arc::new(|err| eprintln!("{}", err))),
    // ... other fields
};
}

Callback Timing

Loop iteration:
  1. Inject pending messages (steering/follow-up)
  2. Check execution limits
  3. before_turn(messages, turn_number)  <-- return false to abort
  4. Compact context
  5. Stream LLM response
  6. Check for error/abort → on_error(message) if StopReason::Error
     → after_turn(messages, usage) even on error/abort
  7. Execute tool calls
  8. Track turn
  9. after_turn(messages, usage)
  10. Emit TurnEnd event

MCP Integration

What is MCP?

The Model Context Protocol (MCP) is a JSON-RPC 2.0 protocol that lets AI agents discover and call tools from external servers. It defines a standard way for agents to connect to tool providers over two transports:

Stdio — spawn a child process, communicate via stdin/stdout (newline-delimited JSON)
HTTP — POST JSON-RPC requests to an HTTP endpoint

Connecting to MCP Servers

Stdio Transport

Use with_mcp_server_stdio() to spawn an MCP server process and register its tools:

use yoagent::Agent;
use yoagent::provider::AnthropicProvider;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut agent = Agent::new(AnthropicProvider)
        .with_system_prompt("You are a helpful assistant with file access.")
        .with_model("claude-sonnet-4-20250514")
        .with_api_key(std::env::var("ANTHROPIC_API_KEY")?)
        .with_mcp_server_stdio(
            "npx",
            &["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
            None,
        )
        .await?;

    let rx = agent.prompt("List files in /tmp").await;
    // handle events...
    Ok(())
}

You can pass environment variables to the server process:

#![allow(unused)]
fn main() {
use std::collections::HashMap;

let mut env = HashMap::new();
env.insert("API_TOKEN".into(), "secret".into());

let agent = Agent::new(provider)
    .with_mcp_server_stdio("my-mcp-server", &["--port", "0"], Some(env))
    .await?;
}

HTTP Transport

For remote MCP servers exposed over HTTP:

#![allow(unused)]
fn main() {
let agent = Agent::new(provider)
    .with_mcp_server_http("http://localhost:8080/mcp")
    .await?;
}

How MCP Tools Work

When you call with_mcp_server_stdio() or with_mcp_server_http(), yoagent:

Connects to the MCP server and performs the initialize handshake
Calls tools/list to discover available tools
Wraps each MCP tool as an AgentTool via McpToolAdapter
Adds them to the agent's tool list

MCP tools appear alongside built-in tools. The LLM sees them with their original names, descriptions, and JSON Schema parameters — it can call them just like any other tool.

Mixing Built-in and MCP Tools

#![allow(unused)]
fn main() {
use yoagent::tools::default_tools;

let agent = Agent::new(provider)
    .with_tools(default_tools())  // bash, read, write, edit, list, search
    .with_mcp_server_stdio("my-db-server", &[], None)
    .await?;
// Agent now has both built-in coding tools AND MCP database tools
}

Using the MCP Client Directly

For lower-level control, use McpClient directly:

#![allow(unused)]
fn main() {
use yoagent::mcp::{McpClient, McpToolAdapter};
use std::sync::Arc;
use tokio::sync::Mutex;

let client = McpClient::connect_stdio("my-server", &[], None).await?;
let tools = client.list_tools().await?;

for tool in &tools {
    println!("{}: {}", tool.name, tool.description.as_deref().unwrap_or(""));
}

// Call a tool directly
let result = client.call_tool("read_file", serde_json::json!({"path": "/tmp/test.txt"})).await?;

// Or wrap as AgentTool adapters
let client = Arc::new(Mutex::new(client));
let adapters = McpToolAdapter::from_client(client).await?;
}

Error Handling

MCP operations return McpError:

McpError::Transport — connection or I/O failure
McpError::Protocol — unexpected response format
McpError::JsonRpc — server returned a JSON-RPC error
McpError::ConnectionClosed — server process exited

When an MCP tool returns isError: true, the adapter converts it to a ToolError::Failed, which the agent loop sends back to the LLM with is_error: true so it can self-correct.

Providers Overview

yoagent supports multiple LLM providers through the StreamProvider trait and ApiProtocol dispatch.

Supported Protocols

Protocol	Provider Struct	API Format
`AnthropicMessages`	`AnthropicProvider`	Anthropic Messages API
`OpenAiCompletions`	`OpenAiCompatProvider`	OpenAI Chat Completions
`OpenAiResponses`	`OpenAiResponsesProvider`	OpenAI Responses API
`AzureOpenAiResponses`	`AzureOpenAiProvider`	Azure OpenAI Responses
`GoogleGenerativeAi`	`GoogleProvider`	Google Gemini API
`GoogleVertex`	`GoogleVertexProvider`	Google Vertex AI
`BedrockConverseStream`	`BedrockProvider`	AWS Bedrock ConverseStream

ApiProtocol Enum

#![allow(unused)]
fn main() {
pub enum ApiProtocol {
    AnthropicMessages,
    OpenAiCompletions,
    OpenAiResponses,
    AzureOpenAiResponses,
    GoogleGenerativeAi,
    GoogleVertex,
    BedrockConverseStream,
}
}

ModelConfig

Full configuration for a model, including provider routing:

#![allow(unused)]
fn main() {
pub struct ModelConfig {
    pub id: String,              // e.g. "gpt-4o"
    pub name: String,            // e.g. "GPT-4o"
    pub api: ApiProtocol,        // Which provider to use
    pub provider: String,        // e.g. "openai"
    pub base_url: String,        // API endpoint
    pub reasoning: bool,         // Supports thinking/reasoning
    pub context_window: u32,     // Context size in tokens
    pub max_tokens: u32,         // Default max output
    pub cost: CostConfig,        // Pricing per million tokens
    pub headers: HashMap<String, String>,  // Extra headers
    pub compat: Option<OpenAiCompat>,      // Quirk flags
}
}

Convenience constructors:

#![allow(unused)]
fn main() {
let anthropic = ModelConfig::anthropic("claude-sonnet-4-20250514", "Claude Sonnet 4");
let openai = ModelConfig::openai("gpt-4o", "GPT-4o");
let google = ModelConfig::google("gemini-2.0-flash", "Gemini 2.0 Flash");
}

ProviderRegistry

Maps ApiProtocol → StreamProvider. The default registry includes all built-in providers:

#![allow(unused)]
fn main() {
let registry = ProviderRegistry::default();

// Use it to stream with any model
let result = registry.stream(&model_config, stream_config, tx, cancel).await?;
}

Custom registries:

#![allow(unused)]
fn main() {
let mut registry = ProviderRegistry::new();
registry.register(ApiProtocol::AnthropicMessages, AnthropicProvider);
}

StreamProvider Trait

#![allow(unused)]
fn main() {
#[async_trait]
pub trait StreamProvider: Send + Sync {
    async fn stream(
        &self,
        config: StreamConfig,
        tx: mpsc::UnboundedSender<StreamEvent>,
        cancel: CancellationToken,
    ) -> Result<Message, ProviderError>;
}
}

All providers receive a StreamConfig, emit StreamEvents through the channel, and return the final Message.

Anthropic Provider

AnthropicProvider implements the Anthropic Messages API with SSE streaming.

Usage

#![allow(unused)]
fn main() {
use yoagent::provider::AnthropicProvider;

let agent = Agent::new(AnthropicProvider)
    .with_model("claude-sonnet-4-20250514")
    .with_api_key(std::env::var("ANTHROPIC_API_KEY").unwrap());
}

Features

Streaming SSE

Uses reqwest-eventsource to parse Anthropic's SSE stream. Events handled:

message_start — Input token usage, cache stats
content_block_start — Text, thinking, or tool_use block
content_block_delta — Text, thinking, input JSON, or signature deltas
content_block_stop — Block complete
message_delta — Stop reason, output usage
message_stop — Stream complete

Extended Thinking

Set thinking_level to enable thinking with a token budget:

Level	Budget Tokens
`Minimal`	128
`Low`	512
`Medium`	2,048
`High`	8,192

Thinking content is streamed as Content::Thinking with a cryptographic signature for verification.

Cache Control

Automatic prompt caching via cache_control markers:

System prompt: Always cached with {"type": "ephemeral"}
Second-to-last message: Gets cache_control on its last content block, creating a cache breakpoint

This means on repeated calls, only the latest message is processed at full price.

Configuration

Setting	Value
API URL	`https://api.anthropic.com/v1/messages`
API Version	`2023-06-01`
Auth Header	`x-api-key`
Default Max Tokens	8,192

Environment Variables

Variable	Purpose
`ANTHROPIC_API_KEY`	API key

OpenAI Compatible Provider

OpenAiCompatProvider implements the OpenAI Chat Completions API. One implementation covers OpenAI, xAI, Groq, Cerebras, OpenRouter, Mistral, DeepSeek, and any other compatible API.

Usage

Requires a ModelConfig with compat flags set in StreamConfig.model_config:

#![allow(unused)]
fn main() {
use yoagent::provider::{OpenAiCompatProvider, ModelConfig};

let agent = Agent::new(OpenAiCompatProvider)
    .with_model("gpt-4o")
    .with_api_key(std::env::var("OPENAI_API_KEY").unwrap());
}

OpenAiCompat Quirk Flags

Different providers have behavioral differences even though they share the same API:

#![allow(unused)]
fn main() {
pub struct OpenAiCompat {
    pub supports_store: bool,
    pub supports_developer_role: bool,
    pub supports_reasoning_effort: bool,
    pub supports_usage_in_streaming: bool,
    pub max_tokens_field: MaxTokensField,       // MaxTokens or MaxCompletionTokens
    pub requires_tool_result_name: bool,
    pub requires_assistant_after_tool_result: bool,
    pub thinking_format: ThinkingFormat,        // OpenAi, Xai, or Qwen
}
}

Provider Presets

Provider	Constructor	Key Differences
OpenAI	`OpenAiCompat::openai()`	`developer` role, `max_completion_tokens`, `store`, `reasoning_effort`
xAI (Grok)	`OpenAiCompat::xai()`	`reasoning` field for thinking (not `reasoning_content`)
Groq	`OpenAiCompat::groq()`	Standard defaults
Cerebras	`OpenAiCompat::cerebras()`	Standard defaults
OpenRouter	`OpenAiCompat::openrouter()`	`max_completion_tokens`
Mistral	`OpenAiCompat::mistral()`	`max_tokens` field
DeepSeek	`OpenAiCompat::deepseek()`	`max_completion_tokens`

Adding a New Compatible Provider

Add a constructor to OpenAiCompat:

#![allow(unused)]
fn main() {
impl OpenAiCompat {
    pub fn my_provider() -> Self {
        Self {
            supports_usage_in_streaming: true,
            // set flags as needed...
            ..Default::default()
        }
    }
}
}

Create a ModelConfig that uses it:

#![allow(unused)]
fn main() {
let config = ModelConfig {
    id: "my-model".into(),
    name: "My Model".into(),
    api: ApiProtocol::OpenAiCompletions,
    provider: "my-provider".into(),
    base_url: "https://api.myprovider.com/v1".into(),
    compat: Some(OpenAiCompat::my_provider()),
    // ...
};
}

Thinking/Reasoning

The ThinkingFormat enum controls how reasoning content is parsed from streams:

ThinkingFormat::OpenAi — Uses reasoning_content field (DeepSeek, default)
ThinkingFormat::Xai — Uses reasoning field (Grok)
ThinkingFormat::Qwen — Uses reasoning_content field (Qwen)

Auth

Uses Authorization: Bearer {api_key} header. Extra headers can be added via ModelConfig.headers.

Google Gemini Provider

Two providers for Google's Gemini models:

GoogleProvider — Google AI Studio (Generative AI API)
GoogleVertexProvider — Google Cloud Vertex AI

Google AI Studio

#![allow(unused)]
fn main() {
use yoagent::provider::GoogleProvider;

let agent = Agent::new(GoogleProvider)
    .with_model("gemini-2.0-flash")
    .with_api_key(std::env::var("GOOGLE_API_KEY").unwrap());
}

API Details

Endpoint: {base_url}/v1beta/models/{model}:streamGenerateContent?alt=sse&key={api_key}
Auth: API key as query parameter
Default base URL: https://generativelanguage.googleapis.com
Default context window: 1,000,000 tokens

Message Format

Google uses a different message format than OpenAI/Anthropic:

yoagent	Google API
`user` role	`user` role
`assistant` role	`model` role
`Content::Text`	`{"text": "..."}`
`Content::Image`	`{"inlineData": {...}}`
`Content::ToolCall`	`{"functionCall": {...}}`
`Message::ToolResult`	`{"functionResponse": {...}}`
System prompt	`systemInstruction` field
Tools	`tools[].functionDeclarations[]`

Streaming

Uses SSE format (alt=sse). Each chunk contains candidates with content.parts and optional usageMetadata.

Google Vertex AI

GoogleVertexProvider uses the same message format but with Vertex AI authentication and endpoints.

Protocol: ApiProtocol::GoogleVertex
Auth: OAuth2 / service account credentials
Endpoint pattern: https://{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/publishers/google/models/{model}:streamGenerateContent

Amazon Bedrock Provider

BedrockProvider implements the AWS Bedrock ConverseStream API.

Usage

#![allow(unused)]
fn main() {
use yoagent::provider::BedrockProvider;

let agent = Agent::new(BedrockProvider)
    .with_model("anthropic.claude-3-sonnet-20240229-v1:0")
    .with_api_key("ACCESS_KEY:SECRET_KEY");  // or ACCESS_KEY:SECRET_KEY:SESSION_TOKEN
}

Authentication

The api_key field uses a colon-separated format:

{access_key_id}:{secret_access_key}
{access_key_id}:{secret_access_key}:{session_token}

Alternatively, provide pre-computed auth headers via ModelConfig.headers or use an IAM proxy that handles SigV4 signing.

API Details

Endpoint: {base_url}/model/{model}/converse-stream
Default base URL: https://bedrock-runtime.us-east-1.amazonaws.com
Protocol: ApiProtocol::BedrockConverseStream

Message Format

Bedrock uses its own content block format:

yoagent	Bedrock API
`Content::Text`	`{"text": "..."}`
`Content::Image`	`{"image": {"format": "...", "source": {"bytes": "..."}}}`
`Content::ToolCall`	`{"toolUse": {"toolUseId": "...", "name": "...", "input": ...}}`
`Message::ToolResult`	`{"toolResult": {"toolUseId": "...", "content": [...], "status": "success"}}`
System prompt	`system` array of text blocks
Tools	`toolConfig.tools[].toolSpec`
Max tokens	`inferenceConfig.maxTokens`

Stream Events

Bedrock's ConverseStream returns these event types:

contentBlockStart — New content block (text or tool use)
contentBlockDelta — Text or tool use input delta
contentBlockStop — Block complete
messageStop — Stop reason (end_turn, max_tokens, tool_use)
metadata — Token usage

Azure OpenAI Provider

AzureOpenAiProvider implements the OpenAI Responses API format with Azure-specific authentication and URL patterns.

Usage

#![allow(unused)]
fn main() {
use yoagent::provider::AzureOpenAiProvider;

let agent = Agent::new(AzureOpenAiProvider)
    .with_model("gpt-4o")
    .with_api_key(std::env::var("AZURE_OPENAI_API_KEY").unwrap());
}

Authentication

Uses the api-key header (not Authorization: Bearer):

api-key: {your_api_key}

Additional headers can be set via ModelConfig.headers (e.g., for Azure AD Bearer tokens).

URL Format

https://{resource}.openai.azure.com/openai/deployments/{deployment}

Set this as ModelConfig.base_url. The provider appends /responses?api-version=2025-01-01-preview.

API Details

Protocol: ApiProtocol::AzureOpenAiResponses
Format: OpenAI Responses API (not Chat Completions)
Streaming: SSE with event types:
- response.output_text.delta — Text content
- response.function_call_arguments.start — Tool call start
- response.function_call_arguments.delta — Tool call arguments
- response.completed — Final usage data

Message Format

Uses the Responses API input format:

yoagent	Azure Responses API
User message	`{"role": "user", "content": "..."}`
Assistant text	`{"type": "message", "role": "assistant", "content": [{"type": "output_text", ...}]}`
Tool call	`{"type": "function_call", "call_id": "...", "name": "...", "arguments": "..."}`
Tool result	`{"type": "function_call_output", "call_id": "...", "output": "..."}`
System prompt	`instructions` field

Built-in Tools

yoagent ships with six coding-oriented tools. Get them all with default_tools():

#![allow(unused)]
fn main() {
use yoagent::tools::default_tools;
let tools = default_tools();
}

BashTool

Execute shell commands with timeout and output capture.

Name: bash
Parameters: command (string, required)

Configuration

#![allow(unused)]
fn main() {
pub struct BashTool {
    pub cwd: Option<String>,           // Working directory
    pub timeout: Duration,             // Default: 120s
    pub max_output_bytes: usize,       // Default: 256KB
    pub deny_patterns: Vec<String>,    // Blocked commands
    pub confirm_fn: Option<ConfirmFn>, // Confirmation callback
}
}

Default deny patterns: rm -rf /, rm -rf /*, mkfs, dd if=, fork bomb.

Example

#![allow(unused)]
fn main() {
let bash = BashTool::default();
// Or customize:
let bash = BashTool {
    cwd: Some("/workspace".into()),
    timeout: Duration::from_secs(60),
    ..Default::default()
};
}

ReadFileTool

Read file contents with optional line range.

Name: read_file
Parameters: path (required), offset (optional, 1-indexed line), limit (optional, number of lines)

Configuration

#![allow(unused)]
fn main() {
pub struct ReadFileTool {
    pub max_bytes: usize,              // Default: 1MB
    pub allowed_paths: Vec<String>,    // Path restrictions (empty = no restriction)
}
}

WriteFileTool

Write content to a file. Creates parent directories automatically.

Name: write_file
Parameters: path (required), content (required)

EditFileTool

Surgical search/replace edits. The most important tool for coding agents — instead of rewriting entire files, the agent specifies exact text to find and replace.

Name: edit_file
Parameters: path (required), old_text (required), new_text (required)

The old_text must match exactly, including whitespace and indentation.

ListFilesTool

List files and directories with optional glob filtering.

Name: list_files
Parameters: path (optional, default: .), pattern (optional glob)

Configuration

#![allow(unused)]
fn main() {
pub struct ListFilesTool {
    pub max_results: usize,    // Default: 200
    pub timeout: Duration,     // Default: 10s
}
}

Uses find or fd for efficient traversal.

SearchTool

Search files using grep (or ripgrep if available).

Name: search
Parameters: pattern (required, regex), path (optional root directory)

Configuration

#![allow(unused)]
fn main() {
pub struct SearchTool {
    pub root: Option<String>,      // Root directory
    pub max_results: usize,        // Default: 50
    pub timeout: Duration,         // Default: 30s
}
}

Returns matching lines with file paths and line numbers.

Configuration

AgentLoopConfig

The main configuration for the agent loop:

#![allow(unused)]
fn main() {
pub struct AgentLoopConfig<'a> {
    pub provider: &'a dyn StreamProvider,
    pub model: String,
    pub api_key: String,
    pub thinking_level: ThinkingLevel,
    pub max_tokens: Option<u32>,
    pub temperature: Option<f32>,
    pub convert_to_llm: Option<ConvertToLlmFn>,
    pub transform_context: Option<TransformContextFn>,
    pub get_steering_messages: Option<GetMessagesFn>,
    pub get_follow_up_messages: Option<GetMessagesFn>,
    pub context_config: Option<ContextConfig>,
    pub execution_limits: Option<ExecutionLimits>,
    pub cache_config: CacheConfig,
    pub tool_execution: ToolExecutionStrategy,
    pub retry_config: RetryConfig,
    pub before_turn: Option<BeforeTurnFn>,
    pub after_turn: Option<AfterTurnFn>,
    pub on_error: Option<OnErrorFn>,
    pub input_filters: Vec<Arc<dyn InputFilter>>,
}
}

StreamConfig

Passed to StreamProvider::stream():

#![allow(unused)]
fn main() {
pub struct StreamConfig {
    pub model: String,
    pub system_prompt: String,
    pub messages: Vec<Message>,
    pub tools: Vec<ToolDefinition>,
    pub thinking_level: ThinkingLevel,
    pub api_key: String,
    pub max_tokens: Option<u32>,
    pub temperature: Option<f32>,
    pub model_config: Option<ModelConfig>,
    pub cache_config: CacheConfig,
}
}

ContextConfig

Controls context window compaction:

#![allow(unused)]
fn main() {
pub struct ContextConfig {
    pub max_context_tokens: usize,      // Default: 100,000
    pub system_prompt_tokens: usize,    // Default: 4,000
    pub keep_recent: usize,             // Default: 10
    pub keep_first: usize,             // Default: 2
    pub tool_output_max_lines: usize,   // Default: 50
}
}

ExecutionLimits

Prevents runaway agents:

#![allow(unused)]
fn main() {
pub struct ExecutionLimits {
    pub max_turns: usize,              // Default: 50
    pub max_total_tokens: usize,       // Default: 1,000,000
    pub max_duration: Duration,        // Default: 600s
}
}

ThinkingLevel

#![allow(unused)]
fn main() {
pub enum ThinkingLevel {
    Off,        // No thinking (default)
    Minimal,    // 128 tokens (Anthropic budget)
    Low,        // 512 tokens
    Medium,     // 2,048 tokens
    High,       // 8,192 tokens
}
}

CostConfig

Token pricing per million:

#![allow(unused)]
fn main() {
pub struct CostConfig {
    pub input_per_million: f64,
    pub output_per_million: f64,
    pub cache_read_per_million: f64,
    pub cache_write_per_million: f64,
}
}

API Reference

Top-Level Functions

`agent_loop()`

#![allow(unused)]
fn main() {
pub async fn agent_loop(
    prompts: Vec<AgentMessage>,
    context: &mut AgentContext,
    config: &AgentLoopConfig<'_>,
    tx: mpsc::UnboundedSender<AgentEvent>,
    cancel: CancellationToken,
) -> Vec<AgentMessage>
}

Start an agent loop with new prompt messages. Returns all messages generated during the run.

`agent_loop_continue()`

#![allow(unused)]
fn main() {
pub async fn agent_loop_continue(
    context: &mut AgentContext,
    config: &AgentLoopConfig<'_>,
    tx: mpsc::UnboundedSender<AgentEvent>,
    cancel: CancellationToken,
) -> Vec<AgentMessage>
}

Resume from existing context. The last message must not be an assistant message.

`default_tools()`

#![allow(unused)]
fn main() {
pub fn default_tools() -> Vec<Box<dyn AgentTool>>
}

Returns: BashTool, ReadFileTool, WriteFileTool, EditFileTool, ListFilesTool, SearchTool.

Agent Struct

High-level stateful wrapper around the agent loop.

Construction

#![allow(unused)]
fn main() {
let agent = Agent::new(provider);
}

Signature	Description
`Agent::new(provider: impl StreamProvider + 'static) -> Self`	Create a new agent with the given provider

Builder Methods

All return Self for chaining.

Method	Description
`with_system_prompt(prompt: impl Into<String>) -> Self`	Set the system prompt
`with_model(model: impl Into<String>) -> Self`	Set the model identifier
`with_api_key(key: impl Into<String>) -> Self`	Set the API key
`with_thinking(level: ThinkingLevel) -> Self`	Set thinking level (`Off`, `Minimal`, `Low`, `Medium`, `High`)
`with_tools(tools: Vec<Box<dyn AgentTool>>) -> Self`	Set tools
`with_max_tokens(max: u32) -> Self`	Set max output tokens
`with_context_config(config: ContextConfig) -> Self`	Set context compaction config
`with_execution_limits(limits: ExecutionLimits) -> Self`	Set execution limits (max turns, tokens, duration)
`without_context_management() -> Self`	Disable automatic context compaction and execution limits
`async with_mcp_server_stdio(command, args, env) -> Result<Self, McpError>`	Connect to MCP server via stdio and add its tools
`async with_mcp_server_http(url) -> Result<Self, McpError>`	Connect to MCP server via HTTP and add its tools

Prompting

Method	Description
`async prompt(text: impl Into<String>) -> UnboundedReceiver<AgentEvent>`	Send a text prompt, returns event stream
`async prompt_messages(messages: Vec<AgentMessage>) -> UnboundedReceiver<AgentEvent>`	Send messages as prompt
`async continue_loop() -> UnboundedReceiver<AgentEvent>`	Resume from current context (for retries)

State Access

Method	Description
`messages() -> &[AgentMessage]`	Get the full message history
`is_streaming() -> bool`	Whether the agent is currently running

State Mutation

Method	Description
`set_tools(tools: Vec<Box<dyn AgentTool>>)`	Replace the tool set
`clear_messages()`	Clear all messages
`append_message(msg: AgentMessage)`	Add a message to history
`replace_messages(msgs: Vec<AgentMessage>)`	Replace all messages

Steering & Follow-Up Queues

Method	Description
`steer(msg: AgentMessage)`	Queue a steering message (interrupts mid-tool-execution)
`follow_up(msg: AgentMessage)`	Queue a follow-up message (processed after agent finishes)
`clear_steering_queue()`	Clear pending steering messages
`clear_follow_up_queue()`	Clear pending follow-up messages
`clear_all_queues()`	Clear both queues
`set_steering_mode(mode: QueueMode)`	Set delivery mode: `OneAtATime` or `All`
`set_follow_up_mode(mode: QueueMode)`	Set delivery mode: `OneAtATime` or `All`

Control

Method	Description
`abort()`	Cancel the current run via `CancellationToken`
`reset()`	Clear all state (messages, queues, streaming flag)

Re-exports

The crate re-exports key types from lib.rs:

#![allow(unused)]
fn main() {
pub use agent::Agent;
pub use agent_loop::{agent_loop, agent_loop_continue};
pub use types::*;  // Message, Content, AgentMessage, AgentEvent, etc.
}

Architecture Overview

Layered Design

yoagent is organized as three conceptual layers within a single crate. Dependencies flow strictly downward — upper layers use lower layers, never the reverse.

┌─────────────────────────────────────────────┐
│  Layer 3: Orchestration          (planned)   │
│  Multi-agent, delegation, work modes         │
├─────────────────────────────────────────────┤
│  Layer 2: Agent + Providers                  │
│  Concrete providers, tools, retry, caching,  │
│  context management, MCP                     │
├─────────────────────────────────────────────┤
│  Layer 1: Core Loop                          │
│  agent_loop, types, traits                   │
│  Provider-agnostic. Tool-agnostic.           │
└─────────────────────────────────────────────┘

Layer 1: Core Loop

The pure agent loop. No opinions about LLMs, no built-in tools. Just the control flow.

Modules: types.rs, agent_loop.rs, provider/traits.rs

Owns:

agent_loop() / agent_loop_continue() — the loop itself
AgentTool trait — interface tools must implement
StreamProvider trait — interface providers must implement
AgentMessage, AgentEvent, StreamDelta — message & event types
AgentContext — system prompt + messages + tools
Tool execution strategies (parallel/sequential/batched)
Streaming tool output (ToolUpdateFn)
Steering & follow-up message injection

Does not own: Any concrete provider or tool implementation.

Layer 2: Agent + Providers

Batteries-included single-agent layer. Most users interact with this.

Modules: agent.rs, context.rs, retry.rs, provider/*.rs, tools/*.rs, mcp/*.rs

Adds on top of Layer 1:

Concrete providers — Anthropic, OpenAI-compat, Google, Azure, Bedrock, Vertex
Provider registry — dispatch by API protocol
Prompt caching — automatic cache breakpoint placement
Retry with backoff — exponential, jitter, respects retry-after
Context management — token estimation, smart truncation, execution limits
Built-in tools — bash, read_file, write_file, edit_file, list_files, search
MCP client — stdio + HTTP transports, tool adapter
Agent struct — stateful builder wrapping it all together

Layer 3: Orchestration (planned)

Multi-agent coordination. Not yet implemented — the architecture is designed to support it when needed.

Planned capabilities:

Orchestrator struct — spawn, delegate, and coordinate multiple agents
Work modes:
- Interactive — multi-turn, human in the loop (current default)
- Autonomous — runs to completion without input (background tasks, CI)
- Pipeline — input → output, chainable (scan → fix → verify)
- Supervisor — delegates to other agents, synthesizes results
Fan-out — same task to multiple agents (different providers for diversity)
Pipeline chaining — output of agent A feeds input of agent B
Agent communication through the orchestrator event bus

Why not yet: Multi-agent orchestration adds complexity. The single-agent loop handles 95% of use cases. Layer 3 will be built when a concrete use case drives it, not speculatively.

Module Layout

yoagent/
├── src/
│   ├── lib.rs                  # Public re-exports
│   │
│   │── Layer 1: Core Loop ─────────────────────
│   ├── types.rs                # Message, Content, AgentTool, AgentEvent
│   ├── agent_loop.rs           # Core loop: prompt → LLM → tools → repeat
│   │
│   │── Layer 2: Agent + Providers ─────────────
│   ├── agent.rs                # Agent struct (stateful wrapper)
│   ├── context.rs              # Token estimation, compaction, limits
│   ├── retry.rs                # Retry with exponential backoff
│   ├── provider/
│   │   ├── traits.rs           # StreamProvider trait, StreamEvent, ProviderError
│   │   ├── model.rs            # ModelConfig, ApiProtocol, OpenAiCompat
│   │   ├── registry.rs         # ProviderRegistry (protocol → provider)
│   │   ├── anthropic.rs        # Anthropic Messages API
│   │   ├── openai_compat.rs    # OpenAI Chat Completions (15+ providers)
│   │   ├── openai_responses.rs # OpenAI Responses API
│   │   ├── google.rs           # Google Generative AI
│   │   ├── google_vertex.rs    # Google Vertex AI
│   │   ├── bedrock.rs          # AWS Bedrock ConverseStream
│   │   ├── azure_openai.rs     # Azure OpenAI
│   │   ├── mock.rs             # Mock provider for testing
│   │   └── sse.rs              # SSE utilities
│   ├── tools/
│   │   ├── bash.rs             # BashTool
│   │   ├── file.rs             # ReadFileTool, WriteFileTool
│   │   ├── edit.rs             # EditFileTool
│   │   ├── list.rs             # ListFilesTool
│   │   └── search.rs           # SearchTool
│   └── mcp/
│       ├── client.rs           # MCP client (stdio + HTTP)
│       ├── tool_adapter.rs     # McpToolAdapter (MCP tool → AgentTool)
│       ├── transport.rs        # Transport implementations
│       └── types.rs            # MCP protocol types

Data Flow

                    ┌─────────────┐
                    │   Caller    │
                    └──────┬──────┘
                           │ prompt / prompt_messages
                    ┌──────▼──────┐
                    │    Agent    │  Layer 2: stateful wrapper
                    │  (agent.rs) │  Manages queues, tools, state
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │ agent_loop  │  Layer 1: core loop
                    │             │  Prompt → LLM → Tools → Repeat
                    └──┬───────┬──┘
                       │       │
              ┌────────▼──┐ ┌──▼────────┐
              │ Provider  │ │   Tools   │  Layer 2: implementations
              │ .stream() │ │ .execute()│
              └────────┬──┘ └──┬────────┘
                       │       │
              ┌────────▼──┐ ┌──▼────────┐
              │ LLM API   │ │ OS / FS   │
              │ (HTTP)    │ │ (shell)   │
              └───────────┘ └───────────┘

Events flow back via mpsc::UnboundedSender<AgentEvent>

How Providers Plug In

Implement StreamProvider trait (Layer 1 interface)
Register with ProviderRegistry under an ApiProtocol (Layer 2)
Set ModelConfig.api to match that protocol
The registry dispatches stream() calls to the right provider

Each provider translates between yoagent's Message/Content types and the provider's native API format. All providers emit StreamEvents through the channel for real-time updates.

How Tools Plug In

Implement AgentTool trait (Layer 1 interface)
Add to the tools vec (via default_tools() or custom)
The agent loop converts tools to ToolDefinition (name, description, schema) for the LLM
When the LLM returns Content::ToolCall, the loop finds the matching tool and calls execute()
Results are wrapped in Message::ToolResult and added to context

Tools receive a CancellationToken child token — they should check it for cooperative cancellation during long operations.

Design Principles

Layers are conceptual, not physical. One crate, clean module boundaries, no feature flags needed.
Dependencies flow down. Layer 1 never imports from Layer 2. Layer 2 never imports from Layer 3.
Layer 1 is stable. The core loop and traits change rarely. New features are added in Layer 2 or 3.
Build what's needed. Layer 3 is designed but not implemented. It will be built when a use case demands it, not speculatively.
Simple over clever. A straightforward loop with good defaults beats an elegant abstraction nobody can debug.

yoagent Documentation