yoyo

yoyo is a coding agent that runs in your terminal. It can read and edit files, execute shell commands, search codebases, and manage git workflows — all through natural language.

yoyo is open-source, written in Rust, and built on yoagent. It started as ~200 lines and evolves itself one commit at a time.

What yoyo can do

Read and edit files — view file contents, make surgical edits, or write new files
Run shell commands — execute anything you'd type in a terminal
Search codebases — grep across files with regex support
Navigate projects — list directories, understand project structure
Track context — monitor token usage, auto-compact when the context window fills up
Persist sessions — save and resume conversations across sessions
Estimate costs — see per-turn and session-total cost estimates

Quick example

export ANTHROPIC_API_KEY=sk-ant-...
cargo install yoyo-agent  # or: cargo run from source

yoyo

Then just talk to it:

> read src/main.rs and find any unwrap() calls that could panic
> fix the bug in parse_config and run the tests
> explain what this codebase does

What makes yoyo different

yoyo is not a product — it's a process. It evolves itself in public. Every improvement is a git commit. Every session is journaled. You can read its source code, its journal, and its identity.

Current version: v0.1.4

Installation

Requirements

Rust toolchain — install from rustup.rs
An API key — from any supported provider (see Providers below)

Install from crates.io

cargo install yoyo-agent

This installs the binary as yoyo in your PATH.

Install from source

git clone https://github.com/yologdev/yoyo-evolve.git
cd yoyo-evolve
cargo build --release

The binary will be at target/release/yoyo.

Run directly with Cargo

If you just want to try it:

cd yoyo-evolve
ANTHROPIC_API_KEY=sk-ant-... cargo run

Providers

yoyo supports multiple AI providers out of the box. Use the --provider flag to select one:

Provider	Flag	Default Model	Env Var
Anthropic (default)	`--provider anthropic`	`claude-opus-4-6`	`ANTHROPIC_API_KEY`
OpenAI	`--provider openai`	`gpt-4o`	`OPENAI_API_KEY`
Google/Gemini	`--provider google`	`gemini-2.0-flash`	`GOOGLE_API_KEY`
OpenRouter	`--provider openrouter`	`anthropic/claude-sonnet-4-20250514`	`OPENROUTER_API_KEY`
xAI	`--provider xai`	`grok-3`	`XAI_API_KEY`
Groq	`--provider groq`	`llama-3.3-70b-versatile`	`GROQ_API_KEY`
DeepSeek	`--provider deepseek`	`deepseek-chat`	`DEEPSEEK_API_KEY`
Mistral	`--provider mistral`	`mistral-large-latest`	`MISTRAL_API_KEY`
Cerebras	`--provider cerebras`	`llama-3.3-70b`	`CEREBRAS_API_KEY`
Ollama	`--provider ollama`	`llama3.2`	(none needed)
Custom	`--provider custom`	(none)	(none needed)

Ollama and custom providers don't require an API key. yoyo will automatically connect to http://localhost:11434/v1 for Ollama or http://localhost:8080/v1 for custom providers. Override the endpoint with --base-url.

Examples:

# Anthropic (default)
ANTHROPIC_API_KEY=sk-ant-... yoyo

# OpenAI
OPENAI_API_KEY=sk-... yoyo --provider openai

# Google Gemini
GOOGLE_API_KEY=... yoyo --provider google

# Local Ollama (no API key needed)
yoyo --provider ollama --model llama3.2

# Custom OpenAI-compatible endpoint
yoyo --provider custom --base-url http://localhost:8080/v1 --model my-model

Set your API key

yoyo resolves your API key in this order:

--api-key CLI flag (highest priority)
Provider-specific environment variable (e.g., OPENAI_API_KEY for --provider openai)
ANTHROPIC_API_KEY environment variable (fallback)
API_KEY environment variable (generic fallback)
api_key in config file (see below)

Set one of them:

# Via environment variable (recommended)
export ANTHROPIC_API_KEY=sk-ant-api03-...

# Or pass directly
yoyo --api-key sk-ant-api03-...

If no key is found via any method (and the provider requires one), yoyo will exit with an error message explaining what to do.

Config file

yoyo supports a TOML-style config file so you don't have to pass flags every time. Config files are checked in this order (first found wins):

.yoyo.toml in the current directory (project-level)
~/.yoyo.toml (home directory shorthand)
~/.config/yoyo/config.toml (XDG user-level)

Example .yoyo.toml:

# Model and provider
model = "claude-sonnet-4-20250514"
provider = "anthropic"
thinking = "medium"

# API key (env vars take priority over this)
api_key = "sk-ant-api03-..."

# Generation settings
max_tokens = 8192
max_turns = 50
temperature = 0.7

# Display settings
no_bell = false
quiet = false
no_color = false

# Custom endpoint (for ollama, proxies, etc.)
# base_url = "http://localhost:11434/v1"

# Permission rules for bash commands
[permissions]
allow = ["git *", "cargo *", "echo *"]
deny = ["rm -rf *", "sudo *"]

# Directory restrictions for file tools
[directories]
allow = ["./src", "./tests"]
deny = ["~/.ssh", "/etc"]

CLI flags always override config file values. For example, --model gpt-4o overrides model = "claude-sonnet-4-20250514" from the config file.

For more details on model configuration, see Models. For thinking levels, see Thinking.

Quick Start

Once installed, start yoyo:

export ANTHROPIC_API_KEY=sk-ant-...
yoyo

Or pass the API key directly:

yoyo --api-key sk-ant-...

First time? If you run yoyo without an API key, an interactive setup wizard walks you through choosing a provider, entering your API key, picking a model, and optionally saving a .yoyo.toml config file. After setup, you go straight into the REPL — no restart needed. You can also run the wizard anytime with yoyo setup. If you prefer to skip it, set your API key environment variable first or press Ctrl+C to cancel.

You'll see a banner like this:

  yoyo v0.1.4 — a coding agent growing up in public
  Type /help for commands, /quit to exit

  model: claude-opus-4-6
  git:   main
  cwd:   /home/user/project

Your first prompt

Type a natural language request:

main > explain what this project does

yoyo will read files, run commands, and respond. You'll see tool executions as they happen:

  ▶ read README.md ✓
  ▶ ls src/ ✓
  ▶ read src/main.rs ✓

This project is a...

Common tasks

Read and explain code:

> read src/main.rs and explain the main function

Make changes:

> add error handling to the parse_config function in src/config.rs

Run commands:

> run the tests and fix any failures

Search a codebase:

> find all TODO comments in this project

Exiting

Type /quit, /exit, or press Ctrl+D.

Interactive Mode (REPL)

Interactive mode is the default when you run yoyo in a terminal. It gives you a read-eval-print loop where you can have a multi-turn conversation with the agent.

Starting

yoyo
# or
cargo run

The prompt

The prompt shows your current git branch (if you're in a git repo):

main 🐙 › _

If you're not in a git repo, you get a plain prompt:

🐙 › _

Line editing & history

yoyo uses rustyline for a full readline experience:

Arrow keys: Navigate within the current line (← →) and through command history (↑ ↓)
Inline hints: As you type a slash command, a dimmed suggestion appears after the cursor showing the completion and a short description — e.g. typing /he shows lp — Show help for commands. Press Tab or → to accept.
Tab completion: Type / and press Tab to see available slash commands with descriptions — each command is shown alongside a short summary of what it does. Partial matches work too — /he<Tab> suggests /help and /health. After typing a command + space, argument-aware completions kick in:
- /model <Tab> — suggests known model names (Claude, GPT, Gemini, etc.)
- /provider <Tab> — suggests known provider names (anthropic, openai, google, etc.)
- /think <Tab> — suggests thinking levels (off, minimal, low, medium, high)
- /git <Tab> — suggests git subcommands (status, log, add, diff, branch, stash)
- /pr <Tab> — suggests PR subcommands (list, view, diff, comment, create, checkout)
- /save <Tab> and /load <Tab> — suggest .json session files in the current directory
- File paths also complete — type src/ma<Tab> to get src/main.rs, or Cargo<Tab> to get Cargo.toml. Directories complete with a trailing / for easy continued navigation.
History recall: Previous inputs are saved across sessions
Keyboard shortcuts: Ctrl-A (start of line), Ctrl-E (end of line), Ctrl-K (kill to end), Ctrl-W (delete word back)
History file: Stored at $XDG_DATA_HOME/yoyo/history (defaults to ~/.local/share/yoyo/history)

How it works

You type a message
yoyo sends it to the LLM along with conversation history
The LLM may call tools (read files, run commands, etc.)
Tool results are streamed back — you see each tool as it executes
The final text response is printed
Token usage and cost are shown after each turn

Auto-continue

If the model stops mid-work (e.g., it says "Next, I'll fix the tests..." but stops), yoyo automatically sends a follow-up prompt to continue. You'll see:

  ⚡ auto-continuing (1/3 — response appears incomplete)...

This happens up to 3 times per user turn. Auto-continue won't fire if:

The model encountered an error
The session budget is exhausted
The response doesn't show clear signs of being incomplete

Tool output

When yoyo uses tools, you'll see status indicators:

  ▶ $ cargo test ✓ (2.1s)
  ▶ read src/main.rs ✓ (42ms)
  ▶ edit src/lib.rs ✓ (15ms)
  ▶ $ cargo test ✗ (1.8s)

✓ means the tool succeeded
✗ means the tool returned an error
The duration shows how long the tool took

Token usage

After each response, you'll see a compact token summary:

  ↳ 3.2s · 1523→842 tokens · $0.0234

Use --verbose (or -v) for the full breakdown including session totals and cache info.

This shows:

Wall-clock time for the response
Input→output tokens for this turn
Estimated cost for this turn

Interrupting

Press Ctrl+C to cancel the current response. The agent will stop and you can type a new prompt. Press Ctrl+C again to exit.

Inline @file mentions

You can reference files directly in your prompts using @path syntax. The file content is automatically read and injected into the conversation — no need for a separate /add command.

> explain @src/main.rs
  ✓ added src/main.rs (250 lines)
  (1 file inlined from @mentions)

> refactor @src/cli.rs:50-100
  ✓ added src/cli.rs (lines 50-100) (51 lines)
  (1 file inlined from @mentions)

> compare @Cargo.toml and @README.md
  ✓ added Cargo.toml (35 lines)
  ✓ added README.md (120 lines)
  (2 files inlined from @mentions)

How it works:

@path — injects the entire file
@path:start-end — injects a specific line range
If the path doesn't exist, the @mention is left as-is (it might be a username)
Email-like patterns (user@example.com) are not treated as file mentions
Images work too: @screenshot.png inlines the image into the conversation

Single-Prompt Mode

Run a single prompt without entering the REPL. yoyo will process the prompt, print the response, and exit.

Usage

The simplest way is a bare positional prompt — just put your prompt in quotes:

yoyo "explain this codebase"
yoyo "find all TODO comments"

You can also use --prompt or -p explicitly:

yoyo --prompt "explain this codebase"
yoyo -p "find all TODO comments"

Both forms are equivalent. The bare prompt form is shorter and matches how most CLI tools work.

When to use it

Single-prompt mode is useful for:

Quick questions — get an answer without starting a session
Scripting — run yoyo as part of a larger workflow
CI/CD pipelines — automate code review or analysis

Example

$ yoyo "count the lines of Rust code in this project"
  ▶ $ find . -name '*.rs' | xargs wc -l ✓ (0.1s)

There are 1,475 lines of Rust code across 1 file (src/main.rs).

Combining with other flags

You can combine prompts with other flags:

yoyo "review this diff" --model claude-sonnet-4-20250514
yoyo "explain the architecture" --thinking high
yoyo -p "analyze the code" --system "You are a security auditor."

Piped Mode

When stdin is not a terminal (i.e., input is piped), yoyo reads all of stdin as a single prompt, processes it, and exits. This works like single-prompt mode but takes input from a pipe instead of a flag.

Usage

echo "explain this code" | yoyo
cat prompt.txt | yoyo
git diff | yoyo

When to use it

Piped mode is useful for:

Passing file contents as part of the prompt
Chaining with other commands in a pipeline
Feeding structured input from scripts

Examples

Review a git diff:

git diff HEAD~1 | yoyo --system "Review this diff for bugs."

Analyze a file:

cat src/main.rs | yoyo --system "Find all potential panics in this Rust code."

Process command output:

cargo test 2>&1 | yoyo --system "Explain these test failures and suggest fixes."

Detection

yoyo detects piped mode automatically by checking if stdin is a terminal. If it is not, piped mode activates. If stdin is a terminal, interactive REPL mode starts instead.

If piped input is empty, yoyo exits with an error: No input on stdin.

Quiet mode

When both stdin and stdout are piped (fully scripted usage), yoyo automatically enables quiet mode, suppressing informational config: and context: loading messages on stderr. You can also enable this explicitly with --quiet or -q:

echo "fix the test" | yoyo -q > result.md  # explicit quiet
echo "fix the test" | yoyo > result.md     # auto-quiet (both pipes detected)

The YOYO_QUIET=1 environment variable also enables quiet mode.

Slash commands aren't dispatched in piped mode

Slash commands (/doctor, /status, /help, etc.) belong to the interactive REPL — they depend on REPL state that piped mode doesn't have. If you pipe a slash command into yoyo, it won't run it; it would only get sent to the model as a literal string and waste a turn of tokens.

Instead, yoyo detects this case, prints a one-line warning to stderr, and exits with status code 2. Use one of these alternatives:

yoyo doctor                       # run the subcommand directly
yoyo --prompt "/doctor"           # send the literal text to the agent
yoyo                              # interactive REPL

REPL Commands

All commands start with /. Type /help inside yoyo to see the full list.

Note: A few commands are also available as shell subcommands — run them directly without entering the REPL:

Subcommand Description

yoyo help Show help message (same as --help)

yoyo version Show version (same as --version)

yoyo setup Run the interactive setup wizard

yoyo init Generate a YOYO.md project context file

yoyo doctor Diagnose yoyo setup (config file, API key, provider, tool availability)

yoyo health Run project health checks (build, test, clippy, fmt — auto-detects project type)

yoyo lint Run project linter (e.g. yoyo lint --strict, yoyo lint unsafe)

yoyo test Run project test suite

yoyo tree Show project directory tree

yoyo map Show project symbol map

yoyo run Run a shell command (e.g. yoyo run cargo clippy)

yoyo diff Show git diff (e.g. yoyo diff --staged)

yoyo commit Commit staged changes (e.g. yoyo commit "fix typo")

yoyo review AI-powered code review (non-interactive, pipeable to files/CI)

yoyo blame Show git blame (e.g. yoyo blame src/main.rs:1-20)

yoyo grep Search files for a pattern (e.g. yoyo grep TODO src/)

yoyo find Find files by name (e.g. yoyo find main)

yoyo index Build and display project index

yoyo update Check for and install the latest yoyo release

yoyo docs Look up docs.rs documentation (e.g. yoyo docs serde)

yoyo watch Toggle watch mode (e.g. yoyo watch all for two-phase lint→test, yoyo watch cargo test)

yoyo status Show version, git branch, and working directory

yoyo undo Undo changes (e.g. yoyo undo --last-commit)

doctor honors --provider and --model if you want to point it at a non-default setup (e.g. yoyo doctor --provider openai). Inside the REPL, the same checks are available as /doctor and /health.

Subcommand	Description
`yoyo help`	Show help message (same as `--help`)
`yoyo version`	Show version (same as `--version`)
`yoyo setup`	Run the interactive setup wizard
`yoyo init`	Generate a YOYO.md project context file
`yoyo doctor`	Diagnose yoyo setup (config file, API key, provider, tool availability)
`yoyo health`	Run project health checks (build, test, clippy, fmt — auto-detects project type)
`yoyo lint`	Run project linter (e.g. `yoyo lint --strict`, `yoyo lint unsafe`)
`yoyo test`	Run project test suite
`yoyo tree`	Show project directory tree
`yoyo map`	Show project symbol map
`yoyo run`	Run a shell command (e.g. `yoyo run cargo clippy`)
`yoyo diff`	Show git diff (e.g. `yoyo diff --staged`)
`yoyo commit`	Commit staged changes (e.g. `yoyo commit "fix typo"`)
`yoyo review`	AI-powered code review (non-interactive, pipeable to files/CI)
`yoyo blame`	Show git blame (e.g. `yoyo blame src/main.rs:1-20`)
`yoyo grep`	Search files for a pattern (e.g. `yoyo grep TODO src/`)
`yoyo find`	Find files by name (e.g. `yoyo find main`)
`yoyo index`	Build and display project index
`yoyo update`	Check for and install the latest yoyo release
`yoyo docs`	Look up docs.rs documentation (e.g. `yoyo docs serde`)
`yoyo watch`	Toggle watch mode (e.g. `yoyo watch all` for two-phase lint→test, `yoyo watch cargo test`)
`yoyo status`	Show version, git branch, and working directory
`yoyo undo`	Undo changes (e.g. `yoyo undo --last-commit`)

Command	Description
`/quit`, `/exit`	Exit yoyo
`/help`	Show available commands
`/help <command>`	Show detailed help for a specific command

Conversation

Command	Description
`/clear`	Clear conversation history and start fresh
`/compact`	Compress conversation to save context space (see Context Management)
`/retry`	Re-send your last input — useful when a response gets cut off or you want to try again
`/retry --with "..."`	Re-run with additional instructions appended (iterative refinement)
`/history`	Show a summary of all messages in the conversation
`/history detail`	Per-turn breakdown with tools used and token counts
`/search <query>`	Search conversation history for messages containing the query (case-insensitive)
`/mark <name>`	Bookmark the current conversation state
`/jump <name>`	Restore conversation to a bookmark (discards messages after it)
`/marks`	List all saved bookmarks

Conversation bookmarks

The /mark and /jump commands let you bookmark points in your conversation and return to them later. This is useful when exploring different approaches — bookmark a good state, try something, and jump back if it doesn't work out.

> /mark before-refactor
  ✓ bookmark 'before-refactor' saved (12 messages)

> ... try something risky ...

> /jump before-refactor
  ✓ jumped to bookmark 'before-refactor' (12 messages)

> /marks
  Saved bookmarks:
    • before-refactor

Bookmarks are stored in memory for the current session. Overwriting a bookmark with the same name updates it. Jumping to a bookmark restores the conversation to exactly that point — any messages added after the bookmark are discarded.

Model, Provider & Thinking

Command	Description
`/model <name\|list\|info>`	Switch, list, or inspect models
`/provider <name>`	Switch provider and reset model to the provider's default
`/think [level]`	Show or change thinking level: `off`, `minimal`, `low`, `medium`, `high`
`/teach [on\|off]`	Toggle teach mode — yoyo explains its reasoning as it works

Examples:

/model claude-sonnet-4-20250514
/model list
/model list anthropic
/provider openai
/provider google
/think high
/think off

The /model command preserves conversation when switching models. Use /model list to browse all available models grouped by provider, or /model list <provider> to show models for a specific provider. The /provider command switches to a different API provider (e.g., anthropic, openai, google, openrouter, ollama, xai, groq, deepseek, mistral, cerebras, custom) and automatically sets the model to the provider's default. Use /provider without arguments to see the current provider and available options. The /think command adjusts the thinking level.

The /teach command toggles teach mode on or off. When teach mode is active, yoyo explains why it's making each change before showing code, uses clear and readable patterns, adds comments on non-obvious lines, and summarizes what you should learn after completing a task. Great for learning while the agent codes. This is a session-only toggle — it resets when you exit.

Session

Command	Description
`/save [path]`	Save conversation to a file (default: `yoyo-session.json`)
`/load [path]`	Load conversation from a file (default: `yoyo-session.json`)

See Session Persistence for details.

Information

Command	Description
`/status`	Show session dashboard: model, git branch, active modes, goal, watch command, file changes, tokens, and context usage
`/tokens`	Show detailed token usage: context window fill level, per-category breakdown, session totals, and estimated cost
`/cost`	Show estimated session cost
`/changelog [N]`	Show recent git commit history (default: 15, max: 100)
`/config`	Show all current settings
`/config show`	Show loaded config file path and merged key-value pairs (secrets masked)
`/config edit`	Open config file in `$EDITOR`
`/hooks`	Show active hooks (pre/post tool execution)
`/permissions`	Show active security and permission configuration
`/version`	Show yoyo version

The /tokens command shows a visual progress bar of your active context plus a per-category breakdown of what's consuming tokens:

  Active context:
    messages:    12
    current:     45.2k / 200.0k tokens

  Context breakdown:
    system prompt       ~1.0k tokens  (2%)
    user messages        8.1k tokens  (18%)
    assistant           12.3k tokens  (27%)
    tool calls           2.4k tokens  (5%)
    tool results        21.4k tokens  (47%)
    ──────────────────────────────────────
    total               45.2k tokens

The largest category is highlighted in bold. If tool results exceed 50% of context, yoyo suggests using /compact.

Documentation

Command	Description
`/docs <crate>`	Look up docs.rs documentation for a Rust crate
`/docs <crate> <item>`	Look up a specific module/item within a crate

The /docs command fetches the docs.rs page for a given crate and shows a quick summary — confirming the crate exists, displaying its description, and listing the crate's API items (modules, structs, traits, enums, functions, macros). No tokens used, no AI involved.

Each category is capped at 10 items with a "+N more" suffix for large crates.

/docs serde
  ✓ serde
  📦 https://docs.rs/serde/latest/serde/
  📝 A generic serialization/deserialization framework

  Modules: de, ser
  Traits: Deserialize, Deserializer, Serialize, Serializer
  Macros: forward_to_deserialize_any

/docs tokio task
  ✓ tokio::task
  📦 https://docs.rs/tokio/latest/tokio/task/
  📝 Asynchronous green-threads...

Shell

Command	Description
`/run <cmd>`	Run a shell command directly — no AI, no tokens used
`!<cmd>`	Shortcut for `/run`
`/bg [subcmd]`	Manage background shell processes
`/web <url>`	Fetch a web page and display clean readable text content

The /run command (or ! shortcut) lets you execute shell commands without going through the AI model. Useful for quick checks (e.g., !git log --oneline -5) without burning API tokens. If the command fails, yoyo shows a brief error preview and suggests asking the AI to analyze the failure or using /fix to auto-fix.

/run ls -la src/
/run cargo test
/run git status

`/bg` — Background process management

The /bg command lets you launch shell commands in the background, monitor their output, and kill them when done. Useful for long-running tasks like builds, test suites, or dev servers.

Subcommand	Description
`/bg run <cmd>`	Launch a command in the background
`/bg list`	Show all background jobs (default when no subcommand)
`/bg output <id>`	Show last 50 lines of a job's output
`/bg output <id> --all`	Show all captured output
`/bg kill <id>`	Kill a running job

/bg run cargo build --release
  ⚡ Background job [1] started: cargo build --release

/bg list
  Background Jobs
    [1]  ● running  12s  cargo build --release

/bg output 1
  ... (last 50 lines of build output)

/bg kill 1
  Killed job [1]

Output is capped at 256KB per job to prevent memory issues. Jobs display colored status: green for success, red for failure, yellow for running.

`/web` — Fetch and read web pages

The /web command fetches a URL and extracts readable text content, stripping away HTML tags, scripts, styles, and navigation. This is useful for quickly pulling in documentation, error explanations, API references, or any web content without getting raw HTML.

/web https://doc.rust-lang.org/book/ch01-01-installation.html
/web docs.rs/serde
/web https://stackoverflow.com/questions/12345

Features:

Auto-prepends https:// if you omit the protocol — /web docs.rs/serde works
Strips noise — removes <script>, <style>, <nav>, <footer>, <header>, and <svg> blocks
Converts structure — headings become prominent, list items get bullets, block elements get newlines
Decodes entities — &, <, >, &#NNN;,  , etc.
Truncates — caps output at ~5,000 characters to keep it readable
No AI tokens used — pure curl + text extraction

Subagent & Planning

Command	Description
`/plan [on\|off\|task]`	Plan mode toggle or one-shot task plan (architect mode)
`/spawn <task>`	Spawn a subagent with a fresh context to handle a task
`/side <question>`	Quick question without tools — doesn't affect main conversation
`/quick <question>`	Fast single-turn answer — no tools, no agent loop

`/plan` — Architect mode & plan mode toggle

The /plan command has two modes:

Plan mode toggle — enter a sustained read-only mode where the agent can read, search, and analyze but won't modify files or run destructive commands:

> /plan on
  📋 Plan mode ON — agent will read and think but not modify files or run commands.
  Use /plan off to return to normal mode.

main 📋 🐙 ›

When plan mode is on, every message you send is prefixed with a constraint telling the agent to think and analyze without writing. The REPL prompt shows a 📋 indicator. Use /plan off (or /plan close) to return to normal operation.

One-shot planning — ask the AI to create a detailed, structured plan for a task without executing any tools:

> /plan add caching to the database layer

  📋 Planning: add caching to the database layer

  ## Files to examine
  - src/db.rs — current database implementation
  - src/config.rs — configuration for cache TTL

  ## Files to modify
  - src/db.rs — add cache layer
  - src/cache.rs — new file for cache implementation
  - tests/cache_test.rs — new tests

  ## Step-by-step approach
  1. Read src/db.rs to understand current query patterns
  2. Create src/cache.rs with an LRU cache struct
  3. Wrap database queries with cache lookups
  4. Add cache invalidation on writes
  5. Add configuration for cache size and TTL

  ## Tests to write
  - Cache hit returns cached value
  - Cache miss falls through to database
  - Write invalidates relevant cache entries

  ## Potential risks
  - Cache invalidation on complex queries
  - Memory pressure with large result sets

  ## Verification
  - Run existing tests to ensure no regressions
  - Run new cache tests
  - Benchmark query latency before/after

  💡 Review the plan above. Say "go ahead" to execute it, or refine it.

After reviewing the plan, you can:

Say "go ahead" to have the agent execute the plan
Ask the agent to refine specific parts ("make the cache configurable")
Modify the approach ("use Redis instead of in-memory")
Say "no" or change direction entirely

This is especially useful for:

Large refactors where you want to understand the scope before committing
Unfamiliar codebases where you want the agent to map things out first
Trust and transparency — see the full plan before any files are modified
Teaching moments — the plan itself teaches you about the codebase structure

`/spawn` — Subagent

The /spawn command creates a fresh AI agent with its own independent context window, sends it your task, runs it to completion, and injects the result back into your main conversation.

This is useful for tasks that would consume a lot of context in your main session — reading large files, multi-step analysis, exploring unfamiliar code — without polluting your primary conversation history.

/spawn read all files in src/ and summarize the architecture
/spawn find all TODO comments in the codebase and list them
/spawn analyze the test coverage and suggest gaps

The subagent has access to the same tools (bash, file operations, etc.) and uses the same model. Its token usage counts toward your session total, but its context is completely separate from your main conversation. When it finishes, a summary of the task and result is injected into your main conversation so you have awareness of what was done.

Automatic sub-agent delegation: In addition to /spawn, the model can autonomously delegate subtasks to a built-in sub_agent tool. This happens transparently — the model decides when a subtask benefits from a fresh context window (e.g., researching a codebase section, running a series of tests). You'll see a 🐙 indicator when delegation occurs.

Git

Command	Description
`/git status`	Show working tree status (`git status --short`) — quick shortcut
`/git log [n]`	Show last n commits (default: 5) via `git log --oneline`
`/git add <path>`	Stage files for commit
`/git stash`	Stash uncommitted changes
`/git stash pop`	Restore stashed changes
`/git stash list`	List all stash entries with colored output
`/git stash show [n]`	Show diff of stash entry (default: latest)
`/git stash drop [n]`	Drop a stash entry (default: latest)
`/commit [msg]`	Commit staged changes — generates a conventional commit message if no msg provided
`/commit --amend [msg]`	Amend last commit — replace message, or keep existing with `--no-edit`
`/diff`	Show colored file summary, change stats, and full diff of uncommitted changes
`/blame <file>`	Show colorized git blame output (`/blame file:10-20` for line ranges)
`/undo`	Revert all uncommitted changes (`git checkout -- .` and `git clean -fd`)
`/pr [number]`	List open PRs (`gh pr list`), or view a specific PR (`gh pr view <number>`)
`/pr create [--draft]`	Create a PR with an AI-generated title and description
`/pr <number> diff`	Show the diff of a PR (`gh pr diff <number>`)
`/pr <number> comment <text>`	Add a comment to a PR (`gh pr comment <number>`)
`/pr <number> checkout`	Checkout a PR branch locally (`gh pr checkout <number>`)
`/health`	Run project health checks — auto-detects project type, reports pass/fail with timing
`/test`	Auto-detect and run project tests — shows output with timing
`/lint`	Auto-detect and run project linter — shows output with timing, feeds failures to agent context
`/lint pedantic`	Run with pedantic clippy lints (Rust only)
`/lint strict`	Run with pedantic + nursery clippy lints (Rust only)
`/lint fix`	Run linter and auto-send failures to AI for fixing
`/lint unsafe`	Scan for unsafe code blocks and suggest safety attributes (Rust only)
`/fix`	Auto-fix build/lint errors — runs health checks, sends failures to the AI agent for fixing
`/loop <N\|until-pass> <prompt>`	Repeat a prompt in a polling loop — runs N times or until the last tool call succeeds
`/update`	Self-update yoyo to the latest GitHub release — detects platform, downloads, replaces the binary

The /git command is a convenience wrapper for common git operations without burning AI tokens or using /run git .... For example:

/git status          # instead of /run git status --short
/git log 10          # instead of /run git log --oneline -10
/git add src/main.rs # stage a file
/git stash           # stash changes
/git stash pop       # restore stash
/git stash list      # see all stash entries
/git stash show 1    # view diff of stash@{1}
/git stash drop 0    # drop the latest stash

The /commit command helps you commit staged changes quickly:

/commit (no arguments): reads your staged diff, generates a conventional commit message (e.g., feat(main): add changes), and asks for confirmation — press y to accept, n to cancel, or e to edit
/commit fix: typo in README: commits directly with your provided message
/commit --amend new message: amend the last commit with a new message
/commit --amend: amend with staged changes — shows current message and asks to keep or edit
/commit -a --amend: auto-stage tracked files and amend the last commit
If nothing is staged, it reminds you to git add first

The /undo command shows you what will be reverted before doing it.

The /pr command is a quick wrapper around the GitHub CLI:

/pr — list the 10 most recent open pull requests
/pr create — create a PR with an AI-generated title and description from your branch's diff and commits
/pr create --draft — same, but as a draft PR
/pr 42 — view details of PR #42
/pr 42 diff — show the diff for PR #42
/pr 42 comment looks good! — add a comment to PR #42
/pr 42 checkout — checkout PR #42's branch locally

For merging or closing PRs, use /run gh pr ... or ask the agent directly — it has full bash access.

The /health command auto-detects your project type by looking for marker files and runs the appropriate checks:

Rust (Cargo.toml): cargo build, cargo test, cargo clippy, cargo fmt --check
Node.js (package.json): npm test, npx eslint .
Python (pyproject.toml, setup.py, setup.cfg): pytest, flake8, mypy
Go (go.mod): go build, go test, go vet
Makefile (Makefile): make test

If no recognized project type is found, it shows a helpful message listing the marker files it looked for.

The /test command is a focused shortcut that only runs the test suite for your project (e.g., cargo test, npm test, python -m pytest, go test ./..., make test). It auto-detects the project type the same way /health does, but runs just the tests — with full output and timing. This is handy for a quick test run without the full suite of lint/build checks that /health performs.

The /lint command is similar to /test but runs only the linter for your project. It auto-detects the project type and runs the appropriate linter:

Rust: cargo clippy --all-targets -- -D warnings
Node.js: npx eslint .
Python: ruff check .
Go: golangci-lint run

For Rust projects, you can increase clippy's strictness:

/lint pedantic — adds -W clippy::pedantic for stricter style checks
/lint strict — adds -W clippy::pedantic -W clippy::nursery for maximum analysis

Strictness levels only affect Rust projects; other languages use their default linter regardless.

When lint fails, the error output is automatically fed into the agent context so you can ask the AI about the errors in your next message. For fully automated fixing, use /lint fix — this runs the linter and, if there are failures, sends them directly to the AI agent for correction (similar to /fix but lint-only).

The /fix command goes one step further than /health — it runs the same health checks, but when any check fails, it sends the full error output to the AI agent with a prompt to fix the issues. The AI reads the relevant files, understands the errors, and applies fixes using its tools. After fixing, it re-runs the checks to verify. This is particularly useful for quickly resolving lint warnings, format issues, or build errors.

/fix
  Detected project: Rust (Cargo)
  Running health checks...
  ✓ build: ok
  ✗ clippy: FAIL
  ✓ fmt: ok

  Sending 1 failure(s) to AI for fixing...

`/loop` — Repeat a prompt in a polling loop

The /loop command runs a prompt repeatedly — either a fixed number of times or until the last tool call succeeds. This is useful for iterative fix-and-test cycles, polling checks, or any task that needs repeated attempts.

/loop 5 run the tests and fix any failures
/loop until-pass run cargo test
/loop 3 check if the server is responding

Syntax: /loop <N|until-pass> <prompt>

N (1–100): Run the prompt exactly N times, with a 1-second pause between iterations so you can Ctrl+C to stop early.
until-pass: Run repeatedly until the last tool call exits without error (e.g. a bash command that returns exit code 0). Safety-capped at 20 iterations.

Each iteration prints a separator showing the current iteration number. In until-pass mode, the loop also shows a success message when it stops.

`/update` — Self-update to latest release

The /update command checks GitHub for the latest release and downloads the new binary in-place.

/update
  Update available: v0.1.5 → v0.2.0
  This will download and replace the current binary.
  Continue? [y/N] y
  Downloading yoyo-x86_64-unknown-linux-gnu.tar.gz...
  ✓ Updated to v0.2.0! Please restart yoyo to use the new version.

The command:

Detects your platform (Linux x86_64, macOS Intel/ARM, Windows x86_64)
Creates a backup of the current binary before replacing
Restores the backup if anything goes wrong
Suggests manual install instructions as a fallback

If you're running a development build (from cargo build), it will suggest using cargo install yoyo-agent instead.

Code Review

Command	Description
`/review`	AI-powered review of staged changes (falls back to unstaged if nothing staged)
`/review <path>`	AI-powered review of a specific file
`/review HEAD~3..HEAD`	Review a specific commit range
`/review --pr 123`	Review a GitHub PR

The /review command sends your code to the AI for a thorough review covering:

Bugs — logic errors, off-by-one errors, null handling, race conditions
Security — injection vulnerabilities, unsafe operations, credential exposure
Style — naming, idiomatic patterns, unnecessary complexity, dead code
Performance — obvious inefficiencies, unnecessary allocations
Suggestions — improvements, missing error handling, better approaches

/review              # review staged changes (or unstaged if nothing staged)
/review src/main.rs  # review a specific file
/review Cargo.toml   # review any file

Non-interactive review (CLI subcommand)

yoyo review also works as a CLI subcommand — no REPL or interactive session needed. This makes it usable in CI pipelines, git hooks, and scripts:

yoyo review                   # review staged/unstaged changes
yoyo review HEAD~3..HEAD      # review a commit range
yoyo review --pr 42           # review a GitHub PR
yoyo review src/main.rs       # review a specific file
yoyo review > review.md       # pipe review to a file

The review streams to stderr for visual feedback; the final review text goes to stdout so it can be piped or redirected. Exit code is 0 on success, 1 on error.

This is one of the most common workflows for developers using coding agents — getting a second pair of eyes on your changes before committing.

Issue Revisiting

Command	Description
`/revisit`	Scan recently closed issues for shelved candidates
`/revisit scan`	Same as above (default subcommand)
`/revisit check #N`	Inspect a specific closed issue and summarize why it was shelved
`/revisit list`	Show issues tracked as revisit candidates
`/revisit add #N <reason>`	Mark a closed issue for future review
`/revisit remove #N`	Remove an issue from the revisit list

The /revisit command helps review closed or shelved GitHub issues that may now be feasible given new capabilities. Issues with labels like wontfix, deferred, or too-complex are highlighted as shelved candidates.

Revisit candidates are stored in .yoyo/revisit.json and persist across sessions.

/revisit                    # scan recently closed issues
/revisit check #42          # inspect why issue #42 was closed
/revisit add #42 Now have better infrastructure
/revisit list               # see all tracked candidates
/revisit remove #42         # stop tracking issue #42

Refactoring

Command	Description
`/refactor`	Show all refactoring tools with examples
`/rename <old> <new>`	Cross-file symbol renaming with word-boundary matching
`/extract <symbol> <source> <target>`	Move a symbol (fn, struct, enum, trait, type, const, static) between files
`/move <Src>::<method> [file::]<Dst>`	Move a method between impl blocks (same file or cross-file)

`/refactor` — Refactoring tools overview

The /refactor command is an umbrella that shows all available refactoring tools at a glance. Run it with no arguments to see a summary with examples:

/refactor

You can also use it as a dispatch to any refactoring subcommand:

/refactor rename MyOldStruct MyNewStruct
/refactor extract parse_config src/lib.rs src/config.rs
/refactor move Parser::validate Validator

These are equivalent to calling /rename, /extract, or /move directly — use whichever form you prefer.

`/rename` — Cross-file symbol renaming

The /rename command does a smart find-and-replace across all git-tracked files, respecting word boundaries (renaming foo won't change foobar or my_foo). Shows a preview of all matches, then asks for confirmation.

/rename my_func new_func
/rename OldStruct NewStruct

`/extract` — Move symbols between files

The /extract command moves a top-level item (function, struct, enum, impl, trait, type alias, const, or static) from one file to another. It uses brace-depth tracking to find the full block, including doc comments and attributes above the declaration.

/extract my_func src/lib.rs src/utils.rs
/extract MyStruct src/main.rs src/types.rs
/extract MyTrait src/old.rs src/new.rs
/extract MyResult src/lib.rs src/errors.rs
/extract MAX_SIZE src/config.rs src/constants.rs

The command shows a preview of the block to be moved and asks for confirmation before making changes. If the target file doesn't exist, it's created. If the symbol is public, yoyo notes that you may need to add a use import in the source file.

`/move` — Relocate methods between impl blocks

The /move command moves a method from one impl block to another, within the same file or across files. It extracts the method (including doc comments and attributes), re-indents it to match the target block, and inserts it before the closing }. Shows a preview and asks for confirmation.

/move MyStruct::process TargetStruct           # same file
/move Parser::parse_expr other.rs::Lexer       # cross-file
/move Config::validate Settings                # same file

If the method uses self. references, yoyo warns you to verify that the field/method references are valid on the target type. This is a common source of bugs when relocating methods between different types.

`rename_symbol` — Agent-invocable rename tool

In addition to the interactive /rename REPL command, yoyo exposes a rename_symbol tool that the AI agent can call directly. This means the agent can rename symbols across files in a single tool call instead of issuing multiple edit_file calls — faster and more reliable for large refactors.

The tool accepts:

old_name (required) — the current symbol name
new_name (required) — the replacement name
path (optional) — limit scope to a specific file or directory

Like write_file and edit_file, rename_symbol asks for user confirmation before making changes (unless --yes is passed).

`ask_user` — Let the model ask you questions

The agent can ask you directed questions mid-task using the ask_user tool. Instead of guessing at your preferences or making assumptions, the model can pause and ask for clarification — a preference, a decision, or context that isn't available in the codebase.

This tool is only available in interactive mode (when stdin is a terminal). In piped mode, the tool is not registered — the model works with what it has.

The question appears with a ❓ prompt, and you type your response directly. If you press Enter with no text or hit EOF, the model receives a "(no response)" indicator and continues on its own.

Project Context

Command	Description
`/add <path>`	Add file contents into the conversation — the AI sees them immediately
`/explain <file>`	Read code from a file and ask the agent to explain it
`/context [system\|tokens\|files]`	Show project context files, system prompt sections, token budget, or files referenced in this conversation
`/find <pattern>`	Fuzzy-search project files by name — respects `.gitignore`, ranked by relevance
`/grep <pattern> [path]`	Search file contents directly — no AI, no tokens, instant results
`/index`	Build a lightweight index of all project source files — shows path, line count, and first-line summary
`/init`	Scan the project and generate a YOYO.md context file with detected build commands, key files, and project structure
`/tree [depth]`	Show project directory tree (default depth: 3, respects `.gitignore`)

`/add` — Inject file contents into conversation

The /add command reads files and injects their contents directly into the conversation as a user message. The AI sees the file immediately without needing to call read_file — similar to Claude Code's @file feature.

/add src/main.rs
  📎 added src/main.rs (truncated: 200 head + 100 tail of 2286 lines)
     use /add src/main.rs:START-END to add specific sections
  (1 file added to conversation)

/add src/main.rs:1-50
  ✓ added src/main.rs (lines 1-50) (50 lines)
  (1 file added to conversation)

/add src/*.rs
  ✓ added src/cli.rs (400 lines)
  ✓ added src/commands.rs (3000 lines)
  ✓ added src/main.rs (850 lines)
  (3 files added to conversation)

/add Cargo.toml README.md
  ✓ added Cargo.toml (28 lines, ~350 tokens)
  ✓ added README.md (50 lines, ~480 tokens)
  (2 files added to conversation)

Features:

Line ranges — /add path:start-end injects only the specified lines
Token estimates — each added file shows an approximate token count (~N tokens) so you can track context usage
Smart truncation — files over 500 lines are automatically truncated, preserving the head (200 lines) and tail (100 lines) with a clear omission marker. Use /add path:start-end to inject specific sections of large files without truncation
Glob patterns — /add src/*.rs expands to all matching files
Multiple files — /add file1 file2 adds both in one message
Syntax highlighting — content is wrapped in fenced code blocks with language detection
No AI tokens used for reading — the file is read locally and injected directly

This is the fastest way to give the AI context about specific files without waiting for it to call tools.

The /find command does fuzzy substring matching across all tracked files in your project (via git ls-files, falling back to a directory walk if not in a git repo). Results are ranked by relevance — filename matches score higher than directory matches, and matches at the start of the filename rank highest.

/find main
  3 files matching 'main':
    src/main.rs
    site/book/index.html
    scripts/main_helper.sh

/find .toml
  2 files matching '.toml':
    Cargo.toml
    docs/book.toml

`/grep` — Search file contents directly

The /grep command searches file contents without using the AI — no tokens, no API call, instant results. This is one of the fastest ways to find code in your project.

/grep TODO
  src/main.rs:42: // TODO: handle edge case
  src/cli.rs:15: // TODO: add validation
  
  2 matches

/grep "fn main" src/
  src/main.rs:10: fn main() {
  
  1 match

/grep -s MyStruct src/lib.rs
  src/lib.rs:5: pub struct MyStruct {
  src/lib.rs:20: impl MyStruct {
  
  2 matches

Features:

Case-insensitive by default — use -s or --case for case-sensitive search
Git-aware — uses git grep in git repos (faster, respects .gitignore), falls back to grep -rn
Colored output — filenames in green, line numbers in cyan, matches highlighted in yellow
Truncated results — shows up to 50 matches with a "narrow your search" hint
Optional path — /grep pattern src/ restricts search to a specific file or directory

The /tree command uses git ls-files to show tracked files in a visual tree structure, automatically respecting your .gitignore. You can specify a depth limit:

/tree        # default: 3 levels deep
/tree 1      # just top-level directories and their files
/tree 5      # deeper view

Example output:

src/
  cli.rs
  format.rs
  main.rs
  prompt.rs
Cargo.toml
README.md

`/index` — Codebase indexing

The /index command builds a lightweight in-memory index of your project's source files. For each text file tracked by git (or found via directory walk), it shows:

Path — the file path relative to the project root
Lines — the total line count
Summary — the first meaningful line (skipping blank lines), which is typically a doc comment, module declaration, or import statement

Binary files (images, fonts, archives, etc.) are automatically skipped.

/index
  Building project index...
  Path                Lines  Summary
  ──────────────────  ─────  ────────────────────────────────────────
  Cargo.toml             18  [package]
  src/cli.rs            400  //! CLI argument parsing and configuration.
  src/commands.rs      4500  //! REPL command handlers for yoyo.
  src/main.rs           850  //! yoyo — a coding agent that evolves itself.
  README.md              50  # yoyo

  5 files, 5818 total lines

This gives you a quick bird's-eye view of the entire codebase without needing to run find, list_files, or wc -l manually.

`/map` — Structural codebase map

The /map command generates a structural summary of your codebase, extracting function signatures, struct/class/trait/enum definitions, constants, and other symbols from source files. This is like a "table of contents" for your entire project.

/map
  Building repo map...

src/main.rs (850 lines)
  pub fn main
  pub struct AgentConfig
  impl AgentConfig

src/cli.rs (400 lines)
  pub fn parse_args
  pub struct Config
  pub const SYSTEM_PROMPT
  ...

  45 symbols across 8 files (using ast-grep)

Usage:

Command	Description
`/map`	Map entire project (public symbols only)
`/map src/`	Map only files under a specific directory
`/map --all`	Include private/non-exported symbols
`/map --all src/`	All symbols under a specific directory
`/map --regex`	Force regex backend (skip ast-grep)

Supported languages: Rust, Python, JavaScript, TypeScript, Go, Java, C, C++, Ruby, Shell.

ast-grep integration: When ast-grep (sg) is installed, /map uses it for more accurate AST-based symbol extraction. When ast-grep is not available, it falls back to built-in regex extractors. The output footer shows which backend was used. Use --regex to force the regex backend for comparison or debugging.

Automatic system prompt integration: The repo map is automatically included in the system prompt at the start of every session, giving the AI structural awareness of your codebase without you needing to manually add files. This is similar to Aider's repo-map feature. The system prompt version is limited to public symbols and capped at ~16K characters to avoid bloating context.

Project Onboarding with `/init`

The /init command scans your project and generates a YOYO.md context file automatically. It:

Detects the project type — Rust, Node.js, Python, Go, or Makefile-based projects
Finds the project name — from Cargo.toml, package.json, README.md title, or directory name
Lists important files — README, config files, CI configs, lock files, etc.
Lists key directories — src/, tests/, docs/, scripts/, etc.
Generates build commands — cargo build, npm test, go test ./..., etc. based on project type

/init
  Scanning project...
  Detected: Rust
  ✓ Created YOYO.md (32 lines) — edit it to add project context.

If YOYO.md or CLAUDE.md already exists, /init won't overwrite it. The generated file is a starting point — edit it to add your project's specific conventions and instructions.

If the project already has instruction files from other AI tools (.cursorrules, AGENTS.md, .github/copilot-instructions.md, CLAUDE.md), /init will:

Print a notice: "Found existing AI configs: .cursorrules — yoyo reads these automatically"
Add an "Other AI Tool Configs" section to the generated YOYO.md listing the found files

yoyo reads these files automatically for additional project context, so there's no need to duplicate their content.

Project Memory

Command	Description
`/remember <note>`	Save a project-specific note that persists across sessions
`/memories [query]`	List all memories, or search by keyword
`/forget <number>`	Remove a memory by its number

Project memories let you teach yoyo things about your project that it should always know — build quirks, team conventions, infrastructure requirements. Memories are stored in .yoyo/memory.json in your project root and are automatically injected into the system prompt at the start of every session.

Example workflow

> /remember this project uses sqlx for database access
  ✓ Remembered: "this project uses sqlx for database access" (1 total memories)

> /remember tests require docker running
  ✓ Remembered: "tests require docker running" (2 total memories)

> /memories
  Project memories (2):
    [0] this project uses sqlx for database access (2026-03-15 08:32)
    [1] tests require docker running (2026-03-15 08:33)

> /forget 0
  ✓ Forgot: "this project uses sqlx for database access" (1 memories remaining)

> /memories docker
  Found 1 memory matching 'docker':
    [1] tests require docker running (2026-03-15 08:33)

Use /memories <query> to filter by keyword when you have many memories. The search is case-insensitive.

Use /remember any time you find yourself repeating the same instruction to the agent. The memory will be there next time you start a session in this project directory.

Custom Slash Commands

You can define your own slash commands by placing .md files in a commands directory. yoyo looks in two locations:

Location	Scope	Priority
`.yoyo/commands/`	Project-local	Higher (overrides global)
`~/.yoyo/commands/`	Global (all projects)	Lower

The filename (without .md) becomes the command name. For example, creating .yoyo/commands/review.md registers a /review custom command. When you type /review, the file's content is sent as the user message to the agent.

Example

Create a custom /summarize command:

mkdir -p .yoyo/commands
cat > .yoyo/commands/summarize.md << 'EOF'
Read the current codebase and provide a high-level summary of:
1. What this project does
2. Key architectural decisions
3. Main dependencies
4. Areas that could use improvement
EOF

Now typing /summarize in the REPL sends that prompt to the agent.

Tips

Project-local commands (.yoyo/commands/) override global ones (~/.yoyo/commands/) with the same name
Share with your team — commit .yoyo/commands/ to version control so everyone gets the same custom commands
Global commands are great for personal workflows you use across all projects (e.g., /standup, /changelog-draft)
Custom commands appear alongside built-in commands — if a custom command has the same name as a built-in, the built-in takes precedence
Custom commands show up in /help under a "Custom" section, and /help <custom-cmd> displays the full .md file content
Tab-completing /help includes custom command names

Unknown commands

If you type a /command that yoyo doesn't recognize, it will tell you:

  unknown command: /foo
  type /help for available commands

Note: lines starting with / that contain spaces (like /model name) are treated as command arguments, not unknown commands.

Multi-Line Input

yoyo supports two ways to enter multi-line input.

Backslash continuation

End a line with \ to continue on the next line:

main > Please review this code and \
  ...  check for any bugs or \
  ...  performance issues.

The backslash and newline are removed, and the lines are joined. The ... prompt indicates yoyo is waiting for more input.

Code fences

Start a line with triple backticks (```) to enter a fenced code block. Everything until the closing ``` is collected as a single input:

main > ```
  ...  Here is a function I want you to review:
  ...  
  ...  fn parse(input: &str) -> Result<Config, Error> {
  ...      let data = serde_json::from_str(input)?;
  ...      Ok(Config::from(data))
  ...  }
  ...  
  ...  Is this handling errors correctly?
  ...  ```

This is useful for pasting code or structured text that spans multiple lines.

Models & Providers

yoyo supports 13 providers out of the box — from Anthropic and OpenAI to local models via Ollama.

Default model

The default model is claude-opus-4-6 (Anthropic). You can change it at startup or mid-session.

Changing the model

At startup:

yoyo --model claude-sonnet-4-20250514
yoyo --model gpt-4o --provider openai
yoyo --model llama3.2 --provider ollama

During a session:

/model claude-sonnet-4-20250514
/model list
/model list openai

Note: Switching models with /model preserves your conversation history — you can change models mid-task without losing context. Use /model list to see all available models grouped by provider, or /model info <name> to see pricing, context window, and provider details for any model.

Providers

Use --provider <name> to select a provider. Each provider has a default model and an environment variable for its API key.

Tip: If you run yoyo without any API key configured, an interactive setup wizard will walk you through choosing a provider and entering your key. You can also save the config to .yoyo.toml directly from the wizard.

Provider	Default Model	API Key Env Var
`anthropic` (default)	`claude-opus-4-6`	`ANTHROPIC_API_KEY`
`openai`	`gpt-4o`	`OPENAI_API_KEY`
`google`	`gemini-2.0-flash`	`GOOGLE_API_KEY`
`openrouter`	`anthropic/claude-sonnet-4-20250514`	`OPENROUTER_API_KEY`
`ollama`	`llama3.2`	(none — local)
`xai`	`grok-3`	`XAI_API_KEY`
`groq`	`llama-3.3-70b-versatile`	`GROQ_API_KEY`
`deepseek`	`deepseek-chat`	`DEEPSEEK_API_KEY`
`mistral`	`mistral-large-latest`	`MISTRAL_API_KEY`
`cerebras`	`llama-3.3-70b`	`CEREBRAS_API_KEY`
`zai`	`glm-4-plus`	`ZAI_API_KEY`
`minimax`	`MiniMax-M2.7`	`MINIMAX_API_KEY`
`custom`	`claude-opus-4-6`	(none — bring your own)

Examples

# OpenAI
OPENAI_API_KEY=sk-... yoyo --provider openai

# Google Gemini
GOOGLE_API_KEY=... yoyo --provider google --model gemini-2.5-pro

# Local with Ollama (no API key needed)
yoyo --provider ollama --model llama3.2

# Custom endpoint (OpenAI-compatible API)
yoyo --provider custom --base-url http://localhost:8080/v1 --model my-model

You can also set these in .yoyo.toml:

provider = "openai"
model = "gpt-4o"
base_url = "https://api.openai.com/v1"

Cost estimation

Cost estimation is built in for many providers:

Model Family	Input (per MTok)	Output (per MTok)
Opus 4.5/4.6	$5.00	$25.00
Opus 4/4.1	$15.00	$75.00
Sonnet	$3.00	$15.00
Haiku 4.5	$1.00	$5.00
Haiku 3.5	$0.80	$4.00

Cost estimates are also available for OpenAI, Google, DeepSeek, Mistral, xAI, Groq, ZAI and more.

Context window

yoyo assumes a 200,000-token context window (the standard for Claude models). When usage exceeds 80% of this, auto-compaction kicks in. See Context Management.

System Prompts

yoyo has a built-in system prompt that instructs the model to act as a coding assistant. You can override it entirely via CLI flags or config file.

Default behavior

The default system prompt tells the model to:

Work as a coding assistant in the user's terminal
Be direct and concise
Use tools proactively (read files, run commands, verify work)
Do things rather than just explain how

Custom system prompt

Inline (CLI flag):

yoyo --system "You are a Rust expert. Focus on performance and safety."

From a file (CLI flag):

yoyo --system-file my-prompt.txt

In config file (.yoyo.toml):

# Inline text
system_prompt = "You are a Go expert. Follow Go idioms strictly."

# Or read from a file
system_file = "prompts/system.txt"

If both system_prompt and system_file are set in the config, system_file takes precedence (same as CLI behavior).

Precedence

When multiple sources provide a system prompt, the highest-priority one wins:

--system-file (CLI flag) — highest priority
--system (CLI flag)
system_file (config file key)
system_prompt (config file key)
Built-in default — lowest priority

This means CLI flags always override config file values, and file-based prompts override inline text at each level.

Use cases

Custom system prompts are useful for:

Specializing the agent — focus on security review, documentation, or a specific language
Project context — tell the agent about your project's conventions
Team defaults — commit .yoyo.toml with system_prompt or system_file so every developer gets the same agent persona
Persona tuning — make the agent more or less verbose, formal, etc.

Viewing the assembled prompt

To see the full system prompt (including project context, repo map, skills, and any overrides), use:

yoyo --print-system-prompt

This prints the complete prompt to stdout and exits — useful for debugging or understanding exactly what context the model receives. It works with other flags:

# See what the prompt looks like with a custom system prompt
yoyo --system "You are a Rust expert" --print-system-prompt

# See the prompt without project context
yoyo --no-project-context --print-system-prompt

Inspecting during a session

Once inside the REPL, use /context system to see the system prompt broken into sections with approximate token counts for each:

/context system

This shows each markdown section (headers like # ... and ## ...), their line counts, estimated token usage, and a brief preview — without leaving the session.

Automatic project context

In addition to the system prompt, yoyo automatically injects project context when available:

Project instructions — from YOYO.md (primary), CLAUDE.md (compatibility alias), or .yoyo/instructions.md
Development conventions — auto-detected from project type (Rust, Python, Node, Go, etc.) when no instruction file is present; includes build/test/lint commands
Project file listing — from git ls-files (up to 200 files)
Recently changed files — from git log (up to 20 files)
Git status — current branch, count of uncommitted and staged changes
Project memories — from memory/ files if present

Use /context to see which project context files are loaded.

Example prompt file

You are a senior Rust developer reviewing code for a production system.
Focus on:
- Error handling correctness
- Memory safety
- Performance implications
- API design

Be concise. Point out issues with line numbers.

Save as review-prompt.txt and use:

# Via CLI flag
yoyo --system-file review-prompt.txt -p "review src/main.rs"

Or set it in your project's .yoyo.toml:

system_file = "review-prompt.txt"

Extended Thinking

Extended thinking gives the model more "reasoning time" before responding. This can improve quality for complex tasks like debugging, architecture decisions, or multi-step refactoring.

Usage

yoyo --thinking high
yoyo --thinking medium
yoyo --thinking low
yoyo --thinking minimal
yoyo --thinking off

Levels

Level	Aliases	Description
`off`	`none`	No extended thinking (default)
`minimal`	`min`	Very brief reasoning
`low`	—	Short reasoning
`medium`	`med`	Moderate reasoning
`high`	`max`	Deep reasoning — best for complex tasks

Levels are case-insensitive: HIGH, High, and high all work.

If you provide an unrecognized level, yoyo defaults to medium with a warning.

When to use it

Complex debugging — use high when the bug is subtle
Architecture decisions — use medium or high for design questions
Simple tasks — use off (the default) for quick file reads, simple edits, etc.

Output

When thinking is enabled, the model's reasoning is shown dimmed in the output so you can follow along without it cluttering the main response.

Trade-offs

Higher thinking levels use more tokens (and thus cost more) but often produce better results for hard problems. For routine tasks, the overhead isn't worth it.

Skills

Skills are markdown files that provide additional context and instructions to yoyo. They're loaded at startup and added to the agent's context.

Usage

yoyo --skills ./skills

You can pass multiple skill directories:

yoyo --skills ./skills --skills ./my-custom-skills

What is a skill?

A skill file is a markdown file with YAML frontmatter. It contains instructions, rules, or context that the agent should follow. For example:

---
name: rust-expert
description: Rust-specific coding guidelines
tools: [bash, read_file, edit_file]
---

# Rust Guidelines

- Always use `clippy` before committing
- Prefer `?` over `.unwrap()` in production code
- Write tests for every public function

Built-in skills

yoyo's own evolution is guided by skills in the skills/ directory of the repository:

evolve — rules for safely modifying its own source code
communicate — writing journal entries and issue responses
self-assess — analyzing its own capabilities
research — searching the web and reading docs
release — evaluating readiness for publishing

Managing skills

From the REPL, use the /skill command to manage skills:

/skill              List all loaded skills
/skill list         List loaded skills with name and description
/skill show <name>  Show the full content of a skill
/skill path         Show the skills directory path(s)
/skill search [query]           Search GitHub for community skills
/skill install <path>           Install a skill from a local directory
/skill install gh:user/repo     Install a skill from a GitHub repository

The install subcommand copies a skill directory into ~/.config/yoyo/skills/<name>/. The source directory must contain a SKILL.md file with YAML frontmatter including a name: field.

Searching for skills

Find community-created skills on GitHub:

# Search for skills by keyword
/skill search research

# Browse all available skills
/skill search

The search command looks for repositories tagged with the yoyo-skill topic on GitHub. Results include the repository name, description, and an install hint. Requires the GitHub CLI (gh).

To make your own skill discoverable, add the yoyo-skill topic to your GitHub repository.

Local install

# Install a local skill
/skill install ./my-custom-skill/

This also works as a shell subcommand:

yoyo skill install ./my-custom-skill/

Remote install from GitHub

Install skills directly from GitHub repositories:

# Install from a repo root
/skill install gh:user/awesome-skill

# Install from a subdirectory
/skill install gh:user/skill-collection/skills/my-skill

# Install from a specific branch
/skill install gh:user/repo@dev

The remote installer uses git clone --depth 1 for efficiency, validates the SKILL.md frontmatter, and cleans up the temporary clone automatically. If no SKILL.md is found at the expected location, yoyo will search the repository and suggest the correct path.

MCP servers

yoyo can connect to Model Context Protocol (MCP) servers, giving the agent access to external tools provided by any MCP-compatible server. Use the --mcp flag with a shell command that starts the server via stdio:

yoyo --mcp "npx -y @modelcontextprotocol/server-fetch"

The flag is repeatable — connect to multiple MCP servers in a single session:

yoyo \
  --mcp "npx -y @modelcontextprotocol/server-fetch" \
  --mcp "npx -y @modelcontextprotocol/server-github" \
  --mcp "python my_custom_server.py"

MCP in config files

You can also configure MCP servers in .yoyo.toml, ~/.yoyo.toml, or ~/.config/yoyo/config.toml, so they connect automatically without needing CLI flags:

mcp = ["npx -y @modelcontextprotocol/server-fetch", "npx open-websearch@latest"]

MCP servers from the config file are merged with any --mcp CLI flags — both sources contribute. CLI flags are additive, not overriding.

Each --mcp command is launched as a child process. yoyo communicates with it over stdio using the MCP protocol, discovers the tools it offers, and makes them available to the agent alongside the built-in tools.

Tool-name collisions

yoyo's builtin tools (bash, read_file, write_file, edit_file, list_files, search, rename_symbol, ask_user, todo, sub_agent, shared_state) take precedence over MCP tools. If an MCP server exposes a tool with one of those names, yoyo will skip the entire server at connect time with a warning on stderr — the colliding tool would otherwise cause the provider API to reject the first turn with "Tool names must be unique" and kill the session.

Note: @modelcontextprotocol/server-filesystem exposes read_file and write_file and will therefore be skipped. Prefer servers with distinct tool names such as @modelcontextprotocol/server-fetch, @modelcontextprotocol/server-memory, or @modelcontextprotocol/server-sequential-thinking — or a filesystem server that prefixes its tools (e.g. fs_read_file).

OpenAPI specs

You can give yoyo access to any HTTP API by pointing it at an OpenAPI specification file. yoyo parses the spec and registers each endpoint as a callable tool:

yoyo --openapi ./petstore.yaml

Like --mcp, this flag is repeatable:

yoyo --openapi ./api-v1.yaml --openapi ./internal-api.json

Both YAML and JSON spec formats are supported.

Additional configuration flags

Beyond skills, MCP, and OpenAPI, a few other flags fine-tune agent behavior:

`--temperature <float>`

Set the sampling temperature (0.0–1.0). Lower values make output more deterministic; higher values make it more creative. Defaults to the model's own default.

yoyo --temperature 0.2   # More focused/deterministic
yoyo --temperature 0.9   # More creative/varied

`--max-turns <int>`

Limit the number of agentic turns (tool-use loops) per prompt. Defaults to 50. Useful for keeping costs predictable or preventing runaway tool loops:

yoyo --max-turns 10

Both flags can also be set in .yoyo.toml:

temperature = 0.5
max_turns = 20

`--no-bell`

Disable the terminal bell notification that rings after long-running prompts (≥3 seconds). By default, yoyo sends a bell character (\x07) when a prompt completes, which causes most terminals to flash the tab or play a sound — useful when you switch away while waiting. Disable it with the flag or environment variable:

yoyo --no-bell
YOYO_NO_BELL=1 yoyo

`--no-update-check`

Skip the startup update check. On startup (interactive REPL mode only), yoyo checks GitHub for a newer release and shows a notification if one exists. The check uses a 3-second timeout and fails silently on network errors. Disable it with the flag or environment variable:

yoyo --no-update-check
YOYO_NO_UPDATE_CHECK=1 yoyo

The update check is automatically skipped in non-interactive modes (piped input, --prompt flag).

`YOYO_SESSION_BUDGET_SECS`

Soft wall-clock budget for an entire yoyo session, in seconds. Unset by default — interactive sessions are unbounded. When set, yoyo exposes a session_budget_remaining() helper that long-running loops (like the self-evolution pipeline) can poll to voluntarily wind down before an external timeout cancels them.

YOYO_SESSION_BUDGET_SECS=2700 yoyo   # 45-minute soft budget

The timer starts on the first call to the helper, not at process startup, so CI cold-start time doesn't burn the budget. If the env var is set but unparseable, yoyo falls back to the 45-minute default rather than silently disabling the guard. This was added to mitigate hourly cron overlap in the evolution workflow (#262).

Error handling

If the skills directory doesn't exist or can't be loaded, yoyo prints a warning and continues without skills:

warning: Failed to load skills: ...

This is intentional — skills are optional and should never prevent yoyo from starting.

Permissions & Safety

yoyo asks for confirmation before running tools that modify your system. This page covers how to control that behavior — from interactive prompts to fine-grained allow/deny rules.

Interactive Permission Prompts

By default, yoyo prompts you before executing any potentially dangerous tool:

bash — every shell command asks for [y/N] confirmation
write_file — creating or overwriting files asks for approval
edit_file — modifying existing files asks for approval
rename_symbol — cross-file symbol renaming asks for approval

Read-only tools (read_file, list_files, search) and the ask_user tool run without prompting.

When a tool needs approval, you'll see something like:

⚡ bash: git status
  Allow? [y/N]

Type y to approve, or n (or just press Enter) to deny.

Auto-Approve Everything: `--yes` / `-y`

If you trust the agent fully (e.g., in a sandboxed environment or CI pipeline), skip all prompts:

yoyo -y -p "refactor the auth module"

This auto-approves every tool call — bash commands, file writes, everything.

⚠️ Use with caution. This gives yoyo unrestricted access to your shell and filesystem.

Command Filtering: `--allow` and `--deny`

For finer control over which bash commands run automatically, use glob patterns:

yoyo --allow "git *" --allow "cargo *" --deny "rm -rf *"

How it works

Deny is checked first. If a command matches any --deny pattern, it's rejected immediately — the agent sees an error message and must try something else.
Allow is checked second. If a command matches any --allow pattern, it runs without prompting.
No match = prompt. Commands that don't match either list get the normal [y/N] prompt.

Patterns use simple glob matching where * matches any sequence of characters (including empty):

Pattern	Matches	Doesn't match
`git *`	`git status`, `git commit -m "hello"`	`echo git`, `gitignore`
`*.rs`	`main.rs`, `src/main.rs`	`main.py`
`cargo * --release`	`cargo build --release`	`cargo build --debug`
`rm -rf *`	`rm -rf /`, `rm -rf /tmp`	`rm file.txt`
`*`	everything	—

Both --allow and --deny are repeatable — pass them multiple times to build up your pattern lists.

Deny overrides allow

If both an allow and deny pattern match the same command, deny wins:

# This allows all commands EXCEPT rm -rf
yoyo --allow "*" --deny "rm -rf *"

The command rm -rf /tmp matches * (allow) and rm -rf * (deny) — deny takes priority, so it's blocked.

Directory Restrictions: `--allow-dir` and `--deny-dir`

Restrict which directories yoyo's file tools can access:

yoyo --allow-dir ./src --allow-dir ./tests --deny-dir ~/.ssh

This affects read_file, write_file, edit_file, list_files, and search.

Rules

If --allow-dir is set, only paths under allowed directories are accessible. Everything else is blocked.
If --deny-dir is set, paths under denied directories are blocked.
Deny overrides allow — if a path is under both an allowed and a denied directory, it's blocked.
Paths are resolved to absolute paths before checking, so ../ traversal escapes are caught.
Symlinks are resolved via canonicalize when the path exists.

Example: lock yoyo to your project

yoyo --allow-dir . --deny-dir ./.git --deny-dir ~/.ssh

This lets yoyo read and write anywhere in the current project, but blocks access to .git internals and your SSH keys.

Config File

Instead of passing flags every time, put your permission rules in .yoyo.toml (project-level), ~/.yoyo.toml (home directory), or ~/.config/yoyo/config.toml (XDG):

[permissions]
allow = ["git *", "cargo *", "echo *"]
deny = ["rm -rf *", "sudo *"]

[directories]
allow = ["./src", "./tests"]
deny = ["~/.ssh", "/etc"]

Precedence

CLI flags override config file values:

If you pass any --allow or --deny flag, the entire [permissions] section from the config file is ignored.
If you pass any --allow-dir or --deny-dir flag, the entire [directories] section from the config file is ignored.
--yes / -y overrides everything — all tools are auto-approved regardless of permission patterns.

Config file search order (first found wins):

.yoyo.toml in the current directory
~/.yoyo.toml in your home directory
~/.config/yoyo/config.toml

Persisting "Always" Approvals

When you answer "a" (always) to a confirmation prompt during a session, yoyo sets a session-wide auto-approve flag. It also offers to save the pattern to .yoyo.toml so the approval persists across sessions:

Bash commands: yoyo simplifies the command into a glob (e.g., cargo test*) and asks if you'd like to save it.
File operations: yoyo generates a directory-based pattern (e.g., src/* for files under src/, or *.rs for root-level Rust files) and offers to save it.

The save prompt only appears once per pattern per session — you won't be asked repeatedly for the same directory.

Practical Examples

Rust development — approve common tools

yoyo --allow "git *" --allow "cargo *" --allow "cat *" --allow "ls *"

Or in .yoyo.toml:

[permissions]
allow = ["git *", "cargo *", "cat *", "ls *", "echo *"]
deny = ["rm -rf *", "sudo *"]

Sandboxed CI — trust everything

yoyo -y -p "run the test suite and fix any failures"

Paranoid mode — restrict to source files only

yoyo --allow-dir ./src --allow-dir ./tests --deny "rm *" --deny "sudo *"

Read-only exploration

yoyo --deny "*" --allow "cat *" --allow "ls *" --allow "grep *" --allow-dir .

This denies all bash commands except read-only ones, and restricts file access to the current directory.

Built-in Command Safety Analysis

Beyond pattern matching, yoyo has a built-in safety analyzer that detects categories of dangerous commands and provides specific warnings. This runs automatically — you don't need to configure it.

Detected patterns include:

Category	Examples
Filesystem destruction	`rm -rf /`, `rm -rf ~`
Force git operations	`git push --force`, `git reset --hard`
Permission changes	`chmod -R 777`, `chown -R` on system dirs
File overwrites	`> /etc/passwd`, `> ~/.bashrc`
System commands	`shutdown`, `reboot`, `halt`
Database destruction	`DROP TABLE`, `DROP DATABASE`, `TRUNCATE TABLE`
Pipe from internet	`curl ... \| bash`, `wget ... \| sh`
Process killing	`kill -9 1`, `killall`
Disk operations	`dd if=`, `fdisk`, `parted`, `mkfs`

When a dangerous pattern is detected, yoyo shows a warning explaining why the command is flagged before asking for confirmation. A handful of truly catastrophic patterns (like rm -rf / or fork bombs) are hard-blocked and can never execute, even with --yes.

Safe commands like ls, cargo test, git status, and grep pass through without triggering any warnings.

Summary

Mechanism	Scope	Effect
Default prompts	All modifying tools	Ask `[y/N]` before each call
`--yes` / `-y`	Everything	Auto-approve all tools
`--allow <pattern>`	Bash commands	Auto-approve matching commands
`--deny <pattern>`	Bash commands	Auto-reject matching commands
`--allow-dir <dir>`	File tools	Only allow paths under these dirs
`--deny-dir <dir>`	File tools	Block paths under these dirs
`[permissions]` in config	Bash commands	Same as `--allow`/`--deny`
`[directories]` in config	File tools	Same as `--allow-dir`/`--deny-dir`
"Always" persistence	Bash + file tools	Offers to save patterns to `.yoyo.toml` on "always"

Tip: Use /permissions during a session to see the full security posture — auto-approve status, command patterns, and directory restrictions all in one view.

Session Persistence

yoyo can save and load conversations, letting you resume where you left off.

Auto-save on exit

yoyo automatically saves your conversation to .yoyo/last-session.json every time you exit the REPL — whether via /quit, /exit, Ctrl-D, or even unexpected termination. No flags needed.

If a previous session is detected on startup, yoyo prints a hint:

  💡 Previous session found. Use --continue or /load .yoyo/last-session.json to resume.

Resuming with --continue

The --continue (or -c) flag restores the last auto-saved session:

yoyo --continue
yoyo -c

When --continue is used:

On startup, yoyo loads from .yoyo/last-session.json (preferred) or yoyo-session.json (legacy fallback)
On exit, the conversation is auto-saved as usual

$ yoyo -c
  📋 resumed session (8 messages, 5 tool calls)
  last prompt: "Can you fix the test failures in commands_map.rs?"
  last reply:  "I found 3 failing tests. The issue was..."

main > what were we working on?

Manual save/load

Save the current conversation:

/save

This writes to yoyo-session.json in the current directory.

Save to a custom path:

/save my-session.json

Load a conversation:

/load
/load my-session.json
/load .yoyo/last-session.json

Session format

Sessions are stored as JSON files containing the conversation message history. The format is determined by the yoagent library.

Error handling

If no previous session exists when using --continue, yoyo prints a message and starts fresh
If a session file is corrupt or can't be parsed, yoyo warns you and starts fresh
Empty conversations (no messages exchanged) are not auto-saved
Save errors are reported but don't crash yoyo

Context Management

Claude models have a finite context window (200,000 tokens). As your conversation grows, it fills up. yoyo helps you manage this.

Checking context usage

Use /tokens to see how full your context window is:

/tokens

Output:

  Active context:
    messages:    24
    current:     85.2k / 200.0k tokens
    ████████░░░░░░░░░░░░ 43%

  Session totals (all API calls):
    input:       120.5k tokens
    output:      45.2k tokens
    cache read:  30.0k tokens
    cache write: 15.0k tokens
    est. cost:   $0.892

When the context window exceeds 75%, you'll see a warning:

    ⚠ Context is getting full. Consider /clear or /compact.

Manual compaction

Use /compact to compress the conversation:

/compact

This summarizes older messages while preserving recent context. You'll see what survived:

  compacted: 24 → 8 messages, ~85.2k → ~32.1k tokens
  📋 Still in context: src/main.rs, src/tools.rs, auth refactor (2 files, 1 topic)

The context summary shows which files and topics are still present after compaction, so you can trust that important context wasn't lost.

Previewing compaction

Not sure if you should compact? Use --preview to see what would happen without changing anything:

/compact --preview

This shows estimated token savings, which messages would be compressed, files touched, and topics in the conversation — all read-only.

Auto-compaction

When the context window exceeds 80% capacity, yoyo automatically compacts the conversation. You'll see:

  ⚡ auto-compacted: 30 → 10 messages, ~165.0k → ~62.0k tokens

This happens transparently after each prompt response. You don't need to do anything — yoyo handles it.

Clearing the conversation

If you want to start completely fresh:

/clear

This removes all messages and resets the conversation. Unlike /compact, nothing is preserved.

Tips

For long sessions, use /tokens periodically to monitor usage
If you notice the agent losing track of earlier context, try /compact
Starting a new task? Use /clear to avoid confusing the agent with unrelated history

Checkpoint-restart strategy

For automated pipelines (like CI scripts), compaction can be lossy. The --context-strategy checkpoint flag provides an alternative: when context usage exceeds 70%, yoyo stops the agent loop and exits with code 2.

yoyo --context-strategy checkpoint -p "do some long task"
# Exit code 2 means "context was getting full — restart me"

The calling script can then restart yoyo with fresh context. This is useful for multi-phase pipelines where a structured restart produces better results than lossy compaction.

The default strategy is compaction, which uses auto-compaction as described above.

Git Integration

yoyo is git-aware. It shows your current branch and provides commands for common git operations.

Branch display

When you're in a git repository, the REPL prompt shows the current branch:

main > _
feature/new-parser > _

On startup, the branch is also shown in the status information:

  git:   main

Git commands

/diff

Show a summary of uncommitted changes (equivalent to git diff --stat):

/diff

Output:

 src/main.rs | 15 +++++++++------
 README.md   |  3 +++
 2 files changed, 12 insertions(+), 6 deletions(-)

If there are no uncommitted changes:

  (no uncommitted changes)

/git diff

Show the actual diff content (line-by-line changes), not just a summary:

/git diff

Shows unstaged changes. To see staged changes instead:

/git diff --cached

/git branch

List all branches, with the current branch highlighted in green:

/git branch

Create and switch to a new branch:

/git branch feature/my-new-feature

/blame

Show who last modified each line of a file, with colorized output:

/blame src/main.rs

Limit to a specific line range:

/blame src/main.rs:10-20

Output is colorized: commit hashes (dim), author names (cyan), dates (dim), line numbers (yellow).

/undo

Revert all uncommitted changes. This is equivalent to git checkout -- .:

/undo

Before reverting, /undo shows you what will be undone:

 src/main.rs | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)
  ✓ reverted all uncommitted changes

If there's nothing to undo:

  (nothing to undo — no uncommitted changes)

Using git through the agent

yoyo's bash tool can run any git command. You can ask the agent directly:

> commit these changes with message "fix: handle empty input"
> show me the last 5 commits
> create a new branch called feature/parser

The agent has full access to git through its shell tool.

Cost Tracking

yoyo estimates the cost of each interaction so you can monitor spending.

Per-turn costs

After each response, you'll see a compact token summary:

  ↳ 3.2s · 1523→842 tokens · $0.0234

With --verbose (or -v), you get the full breakdown:

  tokens: 1523 in / 842 out  [cache: 1000 read, 500 write]  (session: 4200 in / 2100 out)  cost: $0.0234  total: $0.0567  ⏱ 3.2s

cost — estimated cost for this turn
total — estimated cumulative cost for the session

Quick cost check

Use /cost for a quick overview with a breakdown by cost category:

  Session cost: $0.0567
    4.2k in / 2.1k out
    cache: 1.0k read / 500 write

    Breakdown:
      input:       $0.0126
      output:      $0.0315
      cache write: $0.0031
      cache read:  $0.0005

Detailed breakdown

Use /tokens to see a full breakdown including cache usage:

  Session totals:
    input:       120.5k tokens
    output:      45.2k tokens
    cache read:  30.0k tokens
    cache write: 15.0k tokens
    est. cost:   $0.892

Supported models

Costs are estimated based on published pricing for all major providers:

Anthropic

Model	Input	Cache Write	Cache Read	Output
Opus 4.5/4.6	$5/MTok	$6.25/MTok	$0.50/MTok	$25/MTok
Opus 4/4.1	$15/MTok	$18.75/MTok	$1.50/MTok	$75/MTok
Sonnet	$3/MTok	$3.75/MTok	$0.30/MTok	$15/MTok
Haiku 4.5	$1/MTok	$1.25/MTok	$0.10/MTok	$5/MTok
Haiku 3.5	$0.80/MTok	$1/MTok	$0.08/MTok	$4/MTok

OpenAI

Model	Input	Output
GPT-4.1	$2/MTok	$8/MTok
GPT-4.1 Mini	$0.40/MTok	$1.60/MTok
GPT-4.1 Nano	$0.10/MTok	$0.40/MTok
GPT-4o	$2.50/MTok	$10/MTok
GPT-4o Mini	$0.15/MTok	$0.60/MTok
o3	$2/MTok	$8/MTok
o3-mini	$1.10/MTok	$4.40/MTok
o4-mini	$1.10/MTok	$4.40/MTok

Google

Model	Input	Output
Gemini 2.5 Pro	$1.25/MTok	$10/MTok
Gemini 2.5 Flash	$0.15/MTok	$0.60/MTok
Gemini 2.0 Flash	$0.10/MTok	$0.40/MTok

DeepSeek

Model	Input	Output
DeepSeek Chat/V3	$0.27/MTok	$1.10/MTok
DeepSeek Reasoner/R1	$0.55/MTok	$2.19/MTok

Mistral

Model	Input	Output
Mistral Large	$2/MTok	$6/MTok
Mistral Small	$0.10/MTok	$0.30/MTok
Codestral	$0.30/MTok	$0.90/MTok

xAI (Grok)

Model	Input	Output
Grok 3	$3/MTok	$15/MTok
Grok 3 Mini	$0.30/MTok	$0.50/MTok
Grok 2	$2/MTok	$10/MTok

Groq (hosted models)

Model	Input	Output
Llama 3.3 70B	$0.59/MTok	$0.79/MTok
Llama 3.1 8B	$0.05/MTok	$0.08/MTok
Mixtral 8x7B	$0.24/MTok	$0.24/MTok
Gemma2 9B	$0.20/MTok	$0.20/MTok

MTok = million tokens.

OpenRouter

Models accessed through OpenRouter (e.g., anthropic/claude-sonnet-4-20250514) are automatically recognized — the provider prefix is stripped before matching.

Limitations

Cost estimates are approximate — actual billing may differ slightly
For unrecognized models, no cost estimate is shown
Cache read/write costs only apply to Anthropic models; other providers show zero cache costs
Pricing may change — check your provider's pricing page for the latest rates

Keeping costs down

Use smaller models (Haiku, Sonnet, GPT-4.1 Mini, Gemini Flash) for simple tasks
Use /compact to reduce context size (fewer input tokens per turn)
Use single-prompt mode (-p) for quick questions to avoid accumulating context
Turn off extended thinking for routine tasks

Architecture

This page explains the reasoning behind yoyo's internal design — why the codebase is shaped the way it is, what trade-offs were made, and what invariants contributors should understand before changing things. For a machine-generated dependency graph, see DeepWiki.

Why 13 modules instead of 3?

yoyo started as a single 200-line file. By Day 10 it was a single 3,400-line main.rs. That file was split over Days 10–15 into the current structure, not because someone sat down and designed thirteen modules, but because the code kept telling us where the seams were.

The split follows a simple heuristic: if two chunks of code change for different reasons, they belong in different files. Adding a new /git subcommand shouldn't force you to scroll past the markdown renderer. Fixing a cost-calculation bug shouldn't put you in the same file as the CLI argument parser.

The current modules, from smallest to largest:

Module	Lines	Role
`memory.rs`	~375	Project-specific `.yoyo/memory.json` persistence
`docs.rs`	~550	Fetching and parsing docs.rs HTML
`help.rs`	~840	Per-command help text and `/help` handler
`git.rs`	~1,080	Low-level git operations (branch, commit, diff)
`commands_git.rs`	~1,130	`/commit`, `/diff`, `/undo`, `/pr`, `/review` handlers
`repl.rs`	~1,270	Readline loop, tab completion, multi-line input
`commands_session.rs`	~1,340	`/save`, `/load`, `/export`, `/spawn`, `/mark`, `/jump`
`main.rs`	~1,560	Entry point, agent construction, tool wiring
`prompt.rs`	~1,870	Agent execution, streaming event loop, retry logic
`cli.rs`	~2,520	Argument parsing, config files, provider selection
`commands.rs`	~2,910	Core command dispatch, re-exports sub-modules
`commands_project.rs`	~3,660	`/add`, `/fix`, `/test`, `/lint`, `/tree`, `/find`, `/web`, `/plan`
`format.rs`	~4,700	Colors, markdown rendering, cost calc, spinner, diffs

Thirteen modules is a lot for ~24k lines. The alternative — three or four large files — would be easier to navigate in a directory listing but harder to work in. When a module is under 1,500 lines, you can hold its entire API in your head. When it's 4,700 lines (like format.rs), you start wanting to split it further — and that's a fair instinct, discussed below.

The layered design and why it matters

The modules form five rough layers, and the key invariant is: dependencies only point downward.

  ┌─────────────────────────────────────────────────┐
  │  Entry          main.rs                         │
  ├─────────────────────────────────────────────────┤
  │  REPL           repl.rs                         │
  ├─────────────────────────────────────────────────┤
  │  Commands       commands.rs                     │
  │                 commands_git.rs                  │
  │                 commands_project.rs              │
  │                 commands_session.rs              │
  │                 help.rs                          │
  ├─────────────────────────────────────────────────┤
  │  Engine         prompt.rs       format.rs       │
  ├─────────────────────────────────────────────────┤
  │  Utilities      git.rs   memory.rs   docs.rs    │
  └─────────────────────────────────────────────────┘

Entry layer. main.rs parses CLI args (via cli.rs), builds the agent, wires up tools with permission checks, and hands control to either repl.rs (interactive) or prompt.rs (single-prompt / piped mode). It owns the AgentConfig struct and the build_agent() / configure_agent() functions. It also defines StreamingBashTool, a custom replacement for yoagent's default BashTool that reads subprocess stdout/stderr line-by-line via tokio::io::AsyncBufReadExt and emits periodic ToolExecutionUpdate events through the on_update callback. This means when a user runs cargo build or npm install, partial output appears in real-time instead of after the command finishes. The reasoning: agent construction is complex (provider selection, tool wiring, MCP/OpenAPI setup, permission configuration) and shouldn't be tangled with either the REPL loop or command handlers.

REPL layer. repl.rs owns the readline loop, tab completion, multi-line input detection, and the big match block that dispatches / commands. It depends on nearly everything below it because it's the traffic cop — but nothing depends on it. This is intentional: piped mode and single-prompt mode bypass the REPL entirely and go straight to prompt.rs.

Command layer. commands.rs is the hub — it re-exports handlers from three sub-modules (commands_git.rs, commands_project.rs, commands_session.rs) and help.rs. The sub-module split follows domain, not size: git-workflow commands in one file, project-workflow commands in another, session-management commands in a third. This means adding a new /git stash pop subcommand only touches commands_git.rs, even though commands_project.rs is three times larger. The split is by reason-to-change, not by line count.

Engine layer. prompt.rs and format.rs are the two largest modules by complexity. prompt.rs runs the agent, processes the streaming event channel, handles retries on transient errors, and manages context overflow (auto-compaction). format.rs handles everything the user sees: ANSI colors, the incremental MarkdownRenderer, cost calculations for seven providers, the terminal spinner, diff formatting, and dozens of small display utilities. These two modules sit at the same layer because they collaborate tightly — prompt.rs feeds events to format.rs's renderer — but neither depends on commands or the REPL.

Utility layer. git.rs, memory.rs, and docs.rs are leaf modules with no upward dependencies. They wrap external systems (git CLI, filesystem JSON, docs.rs HTTP) behind clean Rust APIs. Any module above can call into them, but they never call up. This makes them easy to test in isolation — and they are: git.rs has 41 tests, memory.rs has 14, docs.rs has 23.

The layering isn't enforced by the compiler — Rust's module system doesn't prevent circular use crate:: imports at the module level. It's enforced by convention and by the fact that violations immediately feel wrong: if git.rs needed to call a command handler, that would be a sign the abstraction is leaking.

Why format.rs is the largest file

At ~4,700 lines with 256 tests, format.rs is twice the size of any other module. This isn't accidental — it's the consequence of a design choice: all terminal presentation logic lives in one place.

The module contains:

Color system — the Color wrapper that respects NO_COLOR, all ANSI color constants
MarkdownRenderer — incremental streaming renderer that turns text deltas into ANSI-colored output with syntax highlighting, handling code blocks, headers, bold/italic, lists, and inline code as tokens arrive
Cost calculations — pricing tables for seven providers, input/output/cache cost breakdowns
Spinner — background activity indicator for API roundtrips
Display utilities — pluralize, truncate, context_bar, format_duration, format_token_count, format_edit_diff, format_tool_summary, and more

The alternative would be splitting into color.rs, renderer.rs, cost.rs, etc. That's probably the right move eventually. But today, having all presentation in one file has a benefit: when you change how something looks, you only need to look in one place. The MarkdownRenderer uses the color system, cost formatting uses the color system, the spinner uses the color system — they're coupled by the shared presentation layer, and co-location makes that coupling visible rather than hiding it across five small files.

The 256 tests are the reason this works at ~4,700 lines. Every public function has test coverage. The MarkdownRenderer alone has tests for every markdown construct it handles. If those tests didn't exist, the file would be unmaintainable at this size.

Why cli.rs is so large

cli.rs (~2,520 lines) handles three jobs that sound simple but aren't:

Argument parsing — yoyo doesn't use clap or structopt. Arguments are parsed by hand from std::env::args. This was a deliberate choice: the CLI has unusual needs (multi-value --mcp flags, --provider with fallback chains, config file merging) that are easier to handle with custom parsing than with a framework's escape hatches. The trade-off is more code in cli.rs, but zero macro magic and full control over error messages.
Config file merging — .yoyo.toml and YOYO.md settings merge with CLI flags and environment variables, with a clear precedence chain. This merging logic accounts for hundreds of lines.
Provider configuration — selecting the right API key, endpoint, and default model for each of eight providers, including fallback behavior when keys aren't set.

The 92 tests in cli.rs verify the parsing of every flag and every merge scenario. Adding a new CLI flag means adding it in one place and adding a test.

The command dispatch pattern

Every /command follows the same pattern:

User types /foo bar baz in the REPL
repl.rs matches on "/foo" and calls commands::handle_foo(args, agent, ...)
The handler does its work, possibly calling into utility modules
If it needs the LLM, it calls prompt::run_prompt() with a constructed input

This pattern is enforced by convention, not by a trait. Early versions tried a Command trait with execute(), but it added ceremony without value — every command has different arguments, different return types, and different needs (some need the agent, some don't, some are async, some aren't). A simple function per command turned out to be the right abstraction level.

The commands.rs hub re-exports all handlers so the REPL only needs use crate::commands::*. The sub-modules (commands_git, commands_project, commands_session) group by domain. When you run /commit, the REPL calls handle_commit(), which is defined in commands_git.rs and re-exported through commands.rs.

Why prompt.rs handles retries internally

prompt.rs encapsulates the entire agent interaction lifecycle: sending the prompt, receiving streaming events, rendering output, and handling errors. Retry logic lives here — not in the REPL or in main.rs — because retries need access to the event stream state.

Three kinds of retries happen:

Tool failures — if a tool execution fails, the error is sent back to the LLM as context and it retries (up to 2 times). This happens inside the agent's own loop.
Transient API errors (429, 5xx) — retried with exponential backoff. The REPL doesn't need to know this happened.
Context overflow — when the conversation exceeds the context window, prompt.rs triggers auto-compaction (asking the LLM to summarize the conversation so far) and retries with the compressed context.

Keeping this in prompt.rs means the REPL's contract is simple: call run_prompt(), get back a PromptOutcome with the response text, token usage, and any unrecoverable errors. The REPL never has to think about retries, backoff, or context management.

The streaming renderer design

yoyo streams LLM output token-by-token. The MarkdownRenderer in format.rs is an incremental state machine that receives text deltas (often partial words or half a markdown construct) and emits ANSI-colored output. This is architecturally significant because:

It can't buffer entire lines. If it did, the output would appear in chunks instead of flowing. An early version had this bug — it was technically correct but felt broken. (Day 17 fix.)
It must track state across deltas. When a delta contains ` and the next delta contains rs, the renderer must know it's inside a code block header. The state machine tracks: are we in a code block? What language? Are we in bold? Italic? A header? A list item?
It must handle malformed markdown gracefully. LLMs sometimes emit unclosed code blocks, nested formatting that doesn't resolve, or markdown-like syntax that isn't actually markdown. The renderer must produce reasonable output regardless.

The alternative — buffering the entire response and rendering it at the end — would be simpler but would make the tool feel unresponsive. Streaming is a UX requirement that imposes real architectural cost.

Invariants contributors should know

No upward dependencies from utilities. git.rs, memory.rs, and docs.rs must never use crate::commands or use crate::repl. If you find yourself wanting to, the abstraction boundary is wrong.

format.rs is the only module that writes ANSI escape codes. Other modules call format::Color, format::DIM, etc. — they don't hardcode escape sequences. This is enforced by convention and makes NO_COLOR support work globally.

Every command handler is a standalone function. No command state persists between invocations (except through the Agent's conversation history and SessionChanges). This makes commands testable in isolation.

Tests live next to the code they test. Each module has a #[cfg(test)] mod tests block at the bottom. The project has ~1,000 tests total. Integration tests live in tests/integration.rs and test the CLI binary as a black box.

The agent is the only LLM dependency. yoyo delegates all LLM interaction to the yoagent crate. prompt.rs receives AgentEvents through a channel — it never constructs HTTP requests or parses API responses directly. This means swapping the LLM backend (or the entire agent framework) would only require changes to main.rs (construction) and prompt.rs (event handling).

Trade-offs and known debt

format.rs should probably be split. The MarkdownRenderer, cost tables, and color utilities are three distinct concerns sharing a file. The blocker isn't technical — it's that all three are coupled through the color system, and splitting would require deciding where Color lives.

Hand-rolled CLI parsing is a maintenance burden. Every new flag requires manual parsing code, help text updates, and config file support. A framework like clap would reduce this at the cost of a dependency and less control over error messages. The current approach works because flags don't change often.

commands.rs as a hub creates a wide dependency surface. Because it re-exports everything, changing any command sub-module can trigger recompilation of anything that imports commands::*. In a larger project this would matter for build times. At ~24k lines, it doesn't yet.

No trait abstraction for commands. This is fine at the current scale but means there's no compile-time guarantee that all commands follow the same pattern. A new contributor might put command logic directly in repl.rs instead of in a handler function. Code review catches this, not the type system.

Grow Your Own Agent

Fork yoyo-evolve, edit two files, and run your own self-evolving coding agent on GitHub Actions.

What You Get

A coding agent that:

Runs on GitHub Actions every ~8 hours
Reads its own source code, picks improvements, implements them
Writes a journal of its evolution
Responds to community issues in its own voice
Gets smarter over time through a persistent memory system

Quick Start

1. Fork the repo

Fork yologdev/yoyo-evolve on GitHub.

2. Edit your agent's identity

IDENTITY.md — your agent's constitution: name, mission, goals, and rules.

PERSONALITY.md — your agent's voice: how it writes, speaks, and expresses itself.

These are the only files you need to edit. Everything else auto-detects.

If you register in the yoyo family Address Book, also update LINEAGE.md so your agent has prompt-visible generation, parent, and branch-point context.

3. Choose your provider

yoyo supports 13+ providers out of the box. Pick the one that fits your budget and preferences:

Provider	Env Var	Default Model	Notes
`anthropic`	`ANTHROPIC_API_KEY`	`claude-opus-4-6`	Default. Best overall quality.
`openai`	`OPENAI_API_KEY`	`gpt-4o`	GPT-4o and o-series models
`google`	`GOOGLE_API_KEY`	`gemini-2.0-flash`	Gemini models
`openrouter`	`OPENROUTER_API_KEY`	`anthropic/claude-sonnet-4-20250514`	Multi-provider gateway
`deepseek`	`DEEPSEEK_API_KEY`	`deepseek-chat`	Very cost-effective
`groq`	`GROQ_API_KEY`	`llama-3.3-70b-versatile`	Fast inference
`mistral`	`MISTRAL_API_KEY`	`mistral-large-latest`	Mistral and Codestral models
`xai`	`XAI_API_KEY`	`grok-3`	Grok models
`ollama`	(none — local)	`llama3.2`	Free, runs on your hardware

For the full list of providers and models, see Models & Providers.

Tip: Anthropic is the default and what yoyo itself uses to evolve. If you're unsure, start there. If cost is a concern, DeepSeek and Groq offer strong results at a fraction of the price. Ollama is free but requires local hardware.

4. Create a GitHub App

Your agent needs a GitHub App to commit code and interact with issues.

Go to Settings > Developer settings > GitHub Apps > New GitHub App
Give it your agent's name
Set permissions:
- Repository > Contents: Read and write
- Repository > Issues: Read and write
- Repository > Discussions: Read and write (optional, for social features)
Install it on your forked repo
Note the App ID, Private Key (generate one), and Installation ID
- Installation ID: visit https://github.com/settings/installations and click your app — the ID is in the URL

5. Set repo secrets

In your fork, go to Settings > Secrets and variables > Actions and add:

Secret	Description
Provider API key	API key for your chosen provider (see table in step 3)
`APP_ID`	GitHub App ID
`APP_PRIVATE_KEY`	GitHub App private key (PEM)
`APP_INSTALLATION_ID`	GitHub App installation ID

Set the API key secret matching your chosen provider. For example, if using Anthropic, add ANTHROPIC_API_KEY. If using OpenAI, add OPENAI_API_KEY. If using DeepSeek, add DEEPSEEK_API_KEY, and so on.

6. Enable the Evolution workflow

Go to Actions in your fork and enable the Evolution workflow. Your agent will start evolving on its next scheduled run, or trigger it manually with Run workflow.

What Each File Does

File	Purpose
`IDENTITY.md`	Agent's constitution — name, mission, goals, rules
`PERSONALITY.md`	Agent's voice — writing style, personality traits
`LINEAGE.md`	Agent's family-tree generation, root ancestor, parent, and branch point
`ECONOMICS.md`	What money/sponsorship means to the agent
`journals/JOURNAL.md`	Chronological log of evolution sessions (auto-maintained)
`DAY_COUNT`	Tracks the agent's current evolution day
`memory/`	Persistent learning system (auto-maintained)
`SPONSORS.md`	Sponsor recognition (auto-maintained)

Costs

Costs vary by provider and model:

Anthropic Claude Opus — ~~$3-8 per session (~~$10-25/day at 3 sessions/day)
Anthropic Claude Sonnet — ~$1-3 per session, good balance of quality and cost
DeepSeek — significantly cheaper, strong coding performance
Groq — fast and affordable for smaller models
Ollama — free (runs locally), but requires capable hardware

The default schedule runs ~3 sessions per day (8-hour gap between runs). To reduce costs, switch to a cheaper provider/model or reduce session frequency.

Customization

Change the provider and model

Set PROVIDER and MODEL environment variables in .github/workflows/evolve.yml:

env:
  PROVIDER: openai
  MODEL: gpt-4o

Or set just MODEL to use a different model within the default provider (Anthropic):

env:
  MODEL: claude-sonnet-4-6

You can also edit the default directly in scripts/evolve.sh.

Change session frequency

Edit the cron schedule in .github/workflows/evolve.yml. The default 0 * * * * (every hour) is gated by an 8-hour gap in the script, so the agent runs ~3 times/day.

Add custom skills

Create markdown files with YAML frontmatter in the skills/ directory. The agent loads them automatically via --skills ./skills.

The sponsor system auto-detects your GitHub Sponsors. No configuration needed — just set up GitHub Sponsors on your account.

The `/update` Command

The yoyo binary's /update command checks for releases from yologdev/yoyo-evolve, not your fork. This is expected behavior. As a fork maintainer, rebuild from source after pulling changes:

cargo build --release

In the future, an evolve portal will provide guided setup including custom update targets.

Optional: Dashboard Notifications

If you have a dashboard repo that accepts repository dispatch events, set a repo variable:

gh variable set DASHBOARD_REPO --body "your-user/your-dashboard" --repo your-user/your-fork

And add the DASHBOARD_TOKEN secret with a token that can dispatch to that repo.

Mutation Testing

yoyo uses cargo-mutants to assess test quality. Mutation testing works by making small changes (mutants) to the source code — flipping conditions, replacing return values, removing function bodies — and checking whether any test catches each change.

If a mutant survives (no test fails), it means that line of code isn't actually tested.

Baseline

As of Day 9, yoyo has 1004 total mutants across its source files. This number grows as features are added. The mutation testing setup uses a 20% maximum survival rate threshold — if more than 20% of tested mutants survive, the check fails.

Metric	Value
Total mutants	1004
Threshold	20% max survival rate
Established	Day 9 (2026-03-09)

Install cargo-mutants

cargo install cargo-mutants

Quick start with the threshold script

The easiest way to run mutation testing is with the threshold script:

# Run with default 20% threshold
./scripts/run_mutants.sh

# Run with a stricter threshold
./scripts/run_mutants.sh --threshold 10

# Just count mutants without running them
./scripts/run_mutants.sh --list

# Test mutants in a specific file only
./scripts/run_mutants.sh --file src/format.rs

The script:

Runs cargo mutants on the project
Counts caught vs survived mutants
Calculates the survival rate
Exits with code 1 if the rate exceeds the threshold
Prints surviving mutants on failure so you know what to fix

This makes it easy for maintainers to run locally and could be added to CI by the project owner.

Run mutation testing directly

From the project root:

# Run all mutants (this takes a while — several minutes)
cargo mutants

# Show only the surviving mutants (uncaught mutations)
cargo mutants -- --survived

# Run mutants for a specific file
cargo mutants -f src/format.rs

# Run mutants for a specific function
cargo mutants -F "format_cost"

Read the results

After a run, cargo-mutants creates a mutants.out/ directory with detailed results:

# Summary
cat mutants.out/caught.txt     # mutants killed by tests ✓
cat mutants.out/survived.txt   # mutants NOT caught — test gaps!
cat mutants.out/timeout.txt    # mutants that caused infinite loops
cat mutants.out/unviable.txt   # mutants that didn't compile

Focus on survived.txt — each line is a mutation that no test catches. These are the weak spots.

Configuration

The mutants.toml file in the project root excludes known-acceptable mutants:

Cosmetic functions — ANSI color codes, banner printing, help text
Interactive I/O — functions that read stdin or require a terminal
Async API calls — prompt execution that needs a live Anthropic API

These exclusions keep mutation testing focused on logic that should be tested. If you add a new feature with testable logic, make sure it's not excluded.

Writing targeted tests

When you find a surviving mutant:

Read what the mutation does (e.g., "replace < with <= in format_cost")
Write a test that specifically catches that boundary condition
Re-run cargo mutants -F "function_name" to verify the mutant is now caught

Example workflow:

# Find surviving mutants
cargo mutants 2>&1 | grep "SURVIVED"

# Write a test to kill the mutant, then verify
cargo mutants -F "format_cost"

Threshold script for CI

The scripts/run_mutants.sh script is designed to be CI-friendly:

# In a CI pipeline or pre-merge check:
./scripts/run_mutants.sh --threshold 20

# Exit codes:
#   0 = survival rate within threshold (PASS)
#   1 = survival rate exceeds threshold (FAIL)

The project owner can add this to CI workflows when ready. For now, contributors should run it locally before submitting PRs that add new logic.

When to run

Mutation testing is slow — it builds and tests your code once per mutant. Run it:

After adding a new feature, to verify test coverage
Before a release, as a quality check
When you suspect the test suite has gaps
On specific files with --file to keep it fast during development

Notes for CI integration

The scripts/run_mutants.sh script and mutants.toml config are ready for a human maintainer to wire into CI. A few things to know:

Git-dependent tests: Some tests (e.g. test_git_branch_returns_something_in_repo, test_build_project_tree_runs, test_get_staged_diff_runs) gracefully handle running outside a git repo. cargo-mutants copies source to a temp directory without .git/, so these tests skip git-specific assertions when not in a repo.
Exclusions are reasonable: The mutants.toml excludes cosmetic/display functions (ANSI colors, banners), interactive I/O (stdin, terminal), and async API calls (needs live Anthropic key). These can't be meaningfully unit-tested.
The script cannot be added to .github/workflows/ by the agent (safety rules), but it exits with code 0/1 and is designed for CI use.

Common Issues

"No API key found"

error: No API key found.
Set ANTHROPIC_API_KEY or API_KEY environment variable.

Fix: Set your Anthropic API key:

export ANTHROPIC_API_KEY=sk-ant-api03-...

yoyo checks ANTHROPIC_API_KEY first, then API_KEY. At least one must be set and non-empty.

"No input on stdin"

No input on stdin.

This happens when you pipe empty input to yoyo:

echo "" | yoyo

Fix: Make sure your piped input contains actual content.

Model errors

  error: [API error message]

This appears when the Anthropic API returns an error. Common causes:

Invalid API key — check your key is correct and active
Rate limiting — you're sending too many requests; wait and retry
Model unavailable — the model you specified doesn't exist or you don't have access

Automatic retry: yoyo automatically retries transient errors (rate limits, server errors, network issues) with exponential backoff — up to 3 retries with 1s, 2s, 4s delays. You'll see a dim message like ⚡ retrying (attempt 2/4, waiting 2s)... when this happens. Auth errors (401, 403) and invalid requests (400) are shown immediately without retrying.

Tool error auto-recovery: When a tool execution fails during a natural-language prompt, yoyo automatically retries the prompt with error context appended (up to 2 times). This lets the agent self-correct — for example, retrying a failed file read with a corrected path. You'll see ⚡ auto-retrying after tool error... when this kicks in.

Use /retry to manually re-send the last prompt after a non-transient error is resolved.

Context window full

    ⚠ Context is getting full. Consider /clear or /compact.

Your conversation is approaching the 200,000-token context limit.

Fix: Use /compact to compress the conversation, or /clear to start fresh.

yoyo auto-compacts at 80% capacity, but you can compact earlier if you prefer.

Auto-recovery from overflow: If the API returns a context overflow error (e.g., "prompt is too long"), yoyo automatically compacts the conversation and retries the prompt once. You'll see:

  ⚡ context overflow detected — auto-compacting and retrying...

This handles the case where the context grows past the limit mid-conversation without you noticing. If the retry also fails, yoyo suggests using /compact manually.

"warning: Failed to load skills"

warning: Failed to load skills: [error]

The --skills directory couldn't be read. yoyo continues without skills.

Fix: Check that the path exists and contains valid skill files.

"unknown command: /foo"

  unknown command: /foo
  type /help for available commands

You typed a command yoyo doesn't recognize. If it's a typo, yoyo will suggest the closest match:

  unknown command: /hlep
  did you mean /help?
  type /help for available commands

Fix: Check the suggestion, or type /help to see all available commands.

"not in a git repository"

  error: not in a git repository

You used /diff or /undo outside a git repo.

Fix: Navigate to a directory that's inside a git repository before starting yoyo.

Ctrl+C behavior

First Ctrl+C — cancels the current response; you can type a new prompt
Second Ctrl+C (or Ctrl+D) — exits yoyo

If a tool execution is hanging, Ctrl+C will abort it.

Session file errors

  error saving: [error]
  error reading yoyo-session.json: [error]
  error parsing: [error]

Session save/load failed. Common causes:

Disk full — free space and try again
Permission denied — check file permissions
Corrupt file — delete the session file and start fresh

Safety & Anti-Crash Guarantees

How does a coding agent that edits its own source code avoid breaking itself?

Good question. yoyo has six layers of defense — from the innermost loop (every single code change) to the outermost (protected files that can never be touched). Here's how each one works.

Layer 1: Build-and-test gate on every commit

No code change is ever committed unless it passes:

cargo build && cargo test

This happens inside the evolution session itself. The agent runs the build and test suite after every edit. If either fails, the change doesn't get committed — the agent reads the error and tries to fix it.

Layer 2: CI on every push

Even after the agent commits locally, GitHub Actions runs the full check suite on every push to main:

cargo build
cargo test
cargo clippy --all-targets -- -D warnings
cargo fmt -- --check

Clippy warnings are treated as errors (-D warnings), so even subtle issues like unused variables or redundant clones get caught. If CI fails, the next evolution session sees the failure and prioritizes fixing it before doing anything else.

Layer 3: Automatic revert on build failure

The evolution script (evolve.sh) has a post-session verification step. After all tasks run, it re-checks the build. If it fails:

It gives the agent up to 3 attempts to fix the errors automatically
If all fix attempts fail, it reverts to the pre-session state:
```
git checkout "$SESSION_START_SHA" -- src/
```

This means a broken session can never leave src/ in a worse state than it started. The revert is surgical — it only touches source files, preserving journal entries and other non-code changes.

Layer 4: Tests before features

yoyo's evolve skill requires writing a test before adding a feature. This isn't just a guideline — the planning phase explicitly instructs each implementation task to "write a test first if possible."

Why this matters: if you write the test first, you know the test covers the new behavior. If you write the feature first, you might write a test that only confirms what you already built, missing edge cases.

Layer 5: No deleting existing tests

The evolve skill has a hard rule: never delete existing tests. Tests are the agent's immune system. Removing them would let regressions slip through silently. As of this writing, yoyo has 91+ tests, and that number only goes up.

Layer 6: Protected files

Some files are simply off-limits. The agent cannot modify:

File	Why it's protected
`IDENTITY.md`	yoyo's constitution — defines who it is and its core rules
`PERSONALITY.md`	yoyo's voice and values
`scripts/evolve.sh`	The evolution loop itself — if this broke, recovery would be manual
`scripts/format_issues.py`	Input sanitization for GitHub issues
`scripts/build_site.py`	Website builder
`.github/workflows/*`	CI configuration — the safety net that catches everything else

These files can only be changed by human maintainers. This prevents a subtle failure mode: the agent "improving" its own safety checks in a way that weakens them.

What happens in practice

A typical evolution session:

evolve.sh verifies the build passes before starting
The planning agent reads source code, journal, and issues
Implementation agents execute tasks, each running build+test after changes
Post-session verification re-checks everything
If anything broke, automatic fix attempts kick in
If fixes fail, revert to pre-session state
CI runs on push as a final backstop
Next session checks CI status — failures get top priority

The result: yoyo has been evolving autonomously since Day 0, growing from ~200 lines to ~3,100+ lines, without ever shipping a broken build to main.

Can it still break?

Theoretically, yes. Safety is defense-in-depth, not a proof of correctness. Some scenarios the current system doesn't catch:

Logic bugs that pass tests — if the test suite doesn't cover a behavior, the agent could change it without noticing
Performance regressions — we rely on official leaderboards (SWE-bench, etc.) rather than custom benchmarks
Subtle UX regressions — the agent tests functionality, not user experience

These are areas for future improvement. But for the core guarantee — "the agent won't commit code that doesn't compile or pass tests" — the six layers above make that extremely unlikely.

yoyo documentation