yoyo
yoyo is a coding agent that runs in your terminal. It can read and edit files, execute shell commands, search codebases, and manage git workflows — all through natural language.
yoyo is open-source, written in Rust, and built on yoagent. It started as ~200 lines and evolves itself one commit at a time.
What yoyo can do
- Read and edit files — view file contents, make surgical edits, or write new files
- Run shell commands — execute anything you'd type in a terminal
- Search codebases — grep across files with regex support
- Navigate projects — list directories, understand project structure
- Track context — monitor token usage, auto-compact when the context window fills up
- Persist sessions — save and resume conversations across sessions
- Estimate costs — see per-turn and session-total cost estimates
Quick example
export ANTHROPIC_API_KEY=sk-ant-...
cargo install yoyo-agent # or: cargo run from source
yoyo
Then just talk to it:
> read src/main.rs and find any unwrap() calls that could panic
> fix the bug in parse_config and run the tests
> explain what this codebase does
What makes yoyo different
yoyo is not a product — it's a process. It evolves itself in public. Every improvement is a git commit. Every session is journaled. You can read its source code, its journal, and its identity.
Current version: v0.1.4
Installation
Requirements
- Rust toolchain — install from rustup.rs
- An API key — from any supported provider (see Providers below)
Install from crates.io
cargo install yoyo-agent
This installs the binary as yoyo in your PATH.
Install from source
git clone https://github.com/yologdev/yoyo-evolve.git
cd yoyo-evolve
cargo build --release
The binary will be at target/release/yoyo.
Run directly with Cargo
If you just want to try it:
cd yoyo-evolve
ANTHROPIC_API_KEY=sk-ant-... cargo run
Providers
yoyo supports multiple AI providers out of the box. Use the --provider flag to select one:
| Provider | Flag | Default Model | Env Var |
|---|---|---|---|
| Anthropic (default) | --provider anthropic | claude-opus-4-6 | ANTHROPIC_API_KEY |
| OpenAI | --provider openai | gpt-4o | OPENAI_API_KEY |
| Google/Gemini | --provider google | gemini-2.0-flash | GOOGLE_API_KEY |
| OpenRouter | --provider openrouter | anthropic/claude-sonnet-4-20250514 | OPENROUTER_API_KEY |
| xAI | --provider xai | grok-3 | XAI_API_KEY |
| Groq | --provider groq | llama-3.3-70b-versatile | GROQ_API_KEY |
| DeepSeek | --provider deepseek | deepseek-chat | DEEPSEEK_API_KEY |
| Mistral | --provider mistral | mistral-large-latest | MISTRAL_API_KEY |
| Cerebras | --provider cerebras | llama-3.3-70b | CEREBRAS_API_KEY |
| Ollama | --provider ollama | llama3.2 | (none needed) |
| Custom | --provider custom | (none) | (none needed) |
Ollama and custom providers don't require an API key. yoyo will automatically connect to http://localhost:11434/v1 for Ollama or http://localhost:8080/v1 for custom providers. Override the endpoint with --base-url.
Examples:
# Anthropic (default)
ANTHROPIC_API_KEY=sk-ant-... yoyo
# OpenAI
OPENAI_API_KEY=sk-... yoyo --provider openai
# Google Gemini
GOOGLE_API_KEY=... yoyo --provider google
# Local Ollama (no API key needed)
yoyo --provider ollama --model llama3.2
# Custom OpenAI-compatible endpoint
yoyo --provider custom --base-url http://localhost:8080/v1 --model my-model
Set your API key
yoyo resolves your API key in this order:
--api-keyCLI flag (highest priority)- Provider-specific environment variable (e.g.,
OPENAI_API_KEYfor--provider openai) ANTHROPIC_API_KEYenvironment variable (fallback)API_KEYenvironment variable (generic fallback)api_keyin config file (see below)
Set one of them:
# Via environment variable (recommended)
export ANTHROPIC_API_KEY=sk-ant-api03-...
# Or pass directly
yoyo --api-key sk-ant-api03-...
If no key is found via any method (and the provider requires one), yoyo will exit with an error message explaining what to do.
Config file
yoyo supports a TOML-style config file so you don't have to pass flags every time. Config files are checked in this order (first found wins):
.yoyo.tomlin the current directory (project-level)~/.yoyo.toml(home directory shorthand)~/.config/yoyo/config.toml(XDG user-level)
Example .yoyo.toml:
# Model and provider
model = "claude-sonnet-4-20250514"
provider = "anthropic"
thinking = "medium"
# API key (env vars take priority over this)
api_key = "sk-ant-api03-..."
# Generation settings
max_tokens = 8192
max_turns = 50
temperature = 0.7
# Custom endpoint (for ollama, proxies, etc.)
# base_url = "http://localhost:11434/v1"
# Permission rules for bash commands
[permissions]
allow = ["git *", "cargo *", "echo *"]
deny = ["rm -rf *", "sudo *"]
# Directory restrictions for file tools
[directories]
allow = ["./src", "./tests"]
deny = ["~/.ssh", "/etc"]
CLI flags always override config file values. For example, --model gpt-4o overrides model = "claude-sonnet-4-20250514" from the config file.
For more details on model configuration, see Models. For thinking levels, see Thinking.
Quick Start
Once installed, start yoyo:
export ANTHROPIC_API_KEY=sk-ant-...
yoyo
Or pass the API key directly:
yoyo --api-key sk-ant-...
First time? If you run
yoyowithout an API key, an interactive setup wizard walks you through choosing a provider, entering your API key, picking a model, and optionally saving a.yoyo.tomlconfig file. After setup, you go straight into the REPL — no restart needed. You can also run the wizard anytime withyoyo setup. If you prefer to skip it, set your API key environment variable first or press Ctrl+C to cancel.
You'll see a banner like this:
yoyo v0.1.4 — a coding agent growing up in public
Type /help for commands, /quit to exit
model: claude-opus-4-6
git: main
cwd: /home/user/project
Your first prompt
Type a natural language request:
main > explain what this project does
yoyo will read files, run commands, and respond. You'll see tool executions as they happen:
▶ read README.md ✓
▶ ls src/ ✓
▶ read src/main.rs ✓
This project is a...
Common tasks
Read and explain code:
> read src/main.rs and explain the main function
Make changes:
> add error handling to the parse_config function in src/config.rs
Run commands:
> run the tests and fix any failures
Search a codebase:
> find all TODO comments in this project
Exiting
Type /quit, /exit, or press Ctrl+D.
Interactive Mode (REPL)
Interactive mode is the default when you run yoyo in a terminal. It gives you a read-eval-print loop where you can have a multi-turn conversation with the agent.
Starting
yoyo
# or
cargo run
The prompt
The prompt shows your current git branch (if you're in a git repo):
main 🐙 › _
If you're not in a git repo, you get a plain prompt:
🐙 › _
Line editing & history
yoyo uses rustyline for a full readline experience:
- Arrow keys: Navigate within the current line (← →) and through command history (↑ ↓)
- Inline hints: As you type a slash command, a dimmed suggestion appears after the cursor showing the completion and a short description — e.g. typing
/heshowslp — Show help for commands. Press Tab or → to accept. - Tab completion: Type
/and press Tab to see available slash commands with descriptions — each command is shown alongside a short summary of what it does. Partial matches work too —/he<Tab>suggests/helpand/health. After typing a command + space, argument-aware completions kick in:/model <Tab>— suggests known model names (Claude, GPT, Gemini, etc.)/provider <Tab>— suggests known provider names (anthropic, openai, google, etc.)/think <Tab>— suggests thinking levels (off, minimal, low, medium, high)/git <Tab>— suggests git subcommands (status, log, add, diff, branch, stash)/pr <Tab>— suggests PR subcommands (list, view, diff, comment, create, checkout)/save <Tab>and/load <Tab>— suggest.jsonsession files in the current directory- File paths also complete — type
src/ma<Tab>to getsrc/main.rs, orCargo<Tab>to getCargo.toml. Directories complete with a trailing/for easy continued navigation.
- History recall: Previous inputs are saved across sessions
- Keyboard shortcuts: Ctrl-A (start of line), Ctrl-E (end of line), Ctrl-K (kill to end), Ctrl-W (delete word back)
- History file: Stored at
$XDG_DATA_HOME/yoyo/history(defaults to~/.local/share/yoyo/history)
How it works
- You type a message
- yoyo sends it to the LLM along with conversation history
- The LLM may call tools (read files, run commands, etc.)
- Tool results are streamed back — you see each tool as it executes
- The final text response is printed
- Token usage and cost are shown after each turn
Tool output
When yoyo uses tools, you'll see status indicators:
▶ $ cargo test ✓ (2.1s)
▶ read src/main.rs ✓ (42ms)
▶ edit src/lib.rs ✓ (15ms)
▶ $ cargo test ✗ (1.8s)
✓means the tool succeeded✗means the tool returned an error- The duration shows how long the tool took
Token usage
After each response, you'll see a compact token summary:
↳ 3.2s · 1523→842 tokens · $0.0234
Use --verbose (or -v) for the full breakdown including session totals and cache info.
This shows:
- Wall-clock time for the response
- Input→output tokens for this turn
- Estimated cost for this turn
Interrupting
Press Ctrl+C to cancel the current response. The agent will stop and you can type a new prompt. Press Ctrl+C again to exit.
Inline @file mentions
You can reference files directly in your prompts using @path syntax. The file content is automatically read and injected into the conversation — no need for a separate /add command.
> explain @src/main.rs
✓ added src/main.rs (250 lines)
(1 file inlined from @mentions)
> refactor @src/cli.rs:50-100
✓ added src/cli.rs (lines 50-100) (51 lines)
(1 file inlined from @mentions)
> compare @Cargo.toml and @README.md
✓ added Cargo.toml (35 lines)
✓ added README.md (120 lines)
(2 files inlined from @mentions)
How it works:
@path— injects the entire file@path:start-end— injects a specific line range- If the path doesn't exist, the
@mentionis left as-is (it might be a username) - Email-like patterns (
user@example.com) are not treated as file mentions - Images work too:
@screenshot.pnginlines the image into the conversation
Single-Prompt Mode
Use --prompt or -p to run a single prompt without entering the REPL. yoyo will process the prompt, print the response, and exit.
Usage
yoyo --prompt "explain this codebase"
yoyo -p "find all TODO comments"
When to use it
Single-prompt mode is useful for:
- Scripting — run yoyo as part of a larger workflow
- Quick questions — get an answer without starting a session
- CI/CD pipelines — automate code review or analysis
Example
$ yoyo -p "count the lines of Rust code in this project"
▶ $ find . -name '*.rs' | xargs wc -l ✓ (0.1s)
There are 1,475 lines of Rust code across 1 file (src/main.rs).
Combining with other flags
You can combine -p with other flags:
yoyo -p "review this diff" --model claude-sonnet-4-20250514
yoyo -p "explain the architecture" --thinking high
yoyo -p "analyze the code" --system "You are a security auditor."
Piped Mode
When stdin is not a terminal (i.e., input is piped), yoyo reads all of stdin as a single prompt, processes it, and exits. This works like single-prompt mode but takes input from a pipe instead of a flag.
Usage
echo "explain this code" | yoyo
cat prompt.txt | yoyo
git diff | yoyo
When to use it
Piped mode is useful for:
- Passing file contents as part of the prompt
- Chaining with other commands in a pipeline
- Feeding structured input from scripts
Examples
Review a git diff:
git diff HEAD~1 | yoyo --system "Review this diff for bugs."
Analyze a file:
cat src/main.rs | yoyo --system "Find all potential panics in this Rust code."
Process command output:
cargo test 2>&1 | yoyo --system "Explain these test failures and suggest fixes."
Detection
yoyo detects piped mode automatically by checking if stdin is a terminal. If it is not, piped mode activates. If stdin is a terminal, interactive REPL mode starts instead.
If piped input is empty, yoyo exits with an error: No input on stdin.
Slash commands aren't dispatched in piped mode
Slash commands (/doctor, /status, /help, etc.) belong to the interactive REPL — they depend on REPL state that piped mode doesn't have. If you pipe a slash command into yoyo, it won't run it; it would only get sent to the model as a literal string and waste a turn of tokens.
Instead, yoyo detects this case, prints a one-line warning to stderr, and exits with status code 2. Use one of these alternatives:
yoyo doctor # run the subcommand directly
yoyo --prompt "/doctor" # send the literal text to the agent
yoyo # interactive REPL
REPL Commands
All commands start with /. Type /help inside yoyo to see the full list.
Note: A few commands are also available as shell subcommands — run them directly without entering the REPL:
Subcommand Description yoyo helpShow help message (same as --help)yoyo versionShow version (same as --version)yoyo setupRun the interactive setup wizard yoyo initGenerate a YOYO.md project context file yoyo doctorDiagnose yoyo setup (config file, API key, provider, tool availability) yoyo healthRun project health checks (build, test, clippy, fmt — auto-detects project type) yoyo lintRun project linter (e.g. yoyo lint --strict,yoyo lint unsafe)yoyo testRun project test suite yoyo treeShow project directory tree yoyo mapShow project symbol map yoyo runRun a shell command (e.g. yoyo run cargo clippy)yoyo diffShow git diff (e.g. yoyo diff --staged)yoyo commitCommit staged changes (e.g. yoyo commit "fix typo")yoyo reviewShow review prompt for staged changes or a file yoyo blameShow git blame (e.g. yoyo blame src/main.rs:1-20)yoyo grepSearch files for a pattern (e.g. yoyo grep TODO src/)yoyo findFind files by name (e.g. yoyo find main)yoyo indexBuild and display project index yoyo updateCheck for and install the latest yoyo release yoyo docsLook up docs.rs documentation (e.g. yoyo docs serde)yoyo watchToggle watch mode (e.g. yoyo watch cargo test)yoyo statusShow version, git branch, and working directory yoyo undoUndo changes (e.g. yoyo undo --last-commit)
doctorhonors--providerand--modelif you want to point it at a non-default setup (e.g.yoyo doctor --provider openai). Inside the REPL, the same checks are available as/doctorand/health.
Navigation
| Command | Description |
|---|---|
/quit, /exit | Exit yoyo |
/help | Show available commands |
/help <command> | Show detailed help for a specific command |
Conversation
| Command | Description |
|---|---|
/clear | Clear conversation history and start fresh |
/compact | Compress conversation to save context space (see Context Management) |
/retry | Re-send your last input — useful when a response gets cut off or you want to try again |
/history | Show a summary of all messages in the conversation |
/search <query> | Search conversation history for messages containing the query (case-insensitive) |
/mark <name> | Bookmark the current conversation state |
/jump <name> | Restore conversation to a bookmark (discards messages after it) |
/marks | List all saved bookmarks |
Conversation bookmarks
The /mark and /jump commands let you bookmark points in your conversation and return to them later. This is useful when exploring different approaches — bookmark a good state, try something, and jump back if it doesn't work out.
> /mark before-refactor
✓ bookmark 'before-refactor' saved (12 messages)
> ... try something risky ...
> /jump before-refactor
✓ jumped to bookmark 'before-refactor' (12 messages)
> /marks
Saved bookmarks:
• before-refactor
Bookmarks are stored in memory for the current session. Overwriting a bookmark with the same name updates it. Jumping to a bookmark restores the conversation to exactly that point — any messages added after the bookmark are discarded.
Model, Provider & Thinking
| Command | Description |
|---|---|
/model <name> | Switch to a different model (preserves conversation) |
/provider <name> | Switch provider and reset model to the provider's default |
/think [level] | Show or change thinking level: off, minimal, low, medium, high |
/teach [on|off] | Toggle teach mode — yoyo explains its reasoning as it works |
Examples:
/model claude-sonnet-4-20250514
/provider openai
/provider google
/think high
/think off
The /model command preserves conversation when switching models. The /provider command switches to a different API provider (e.g., anthropic, openai, google, openrouter, ollama, xai, groq, deepseek, mistral, cerebras, custom) and automatically sets the model to the provider's default. Use /provider without arguments to see the current provider and available options. The /think command adjusts the thinking level.
The /teach command toggles teach mode on or off. When teach mode is active, yoyo explains why it's making each change before showing code, uses clear and readable patterns, adds comments on non-obvious lines, and summarizes what you should learn after completing a task. Great for learning while the agent codes. This is a session-only toggle — it resets when you exit.
Session
| Command | Description |
|---|---|
/save [path] | Save conversation to a file (default: yoyo-session.json) |
/load [path] | Load conversation from a file (default: yoyo-session.json) |
See Session Persistence for details.
Information
| Command | Description |
|---|---|
/status | Show current model, git branch, working directory, and session token totals |
/tokens | Show detailed token usage: context window fill level, session totals, and estimated cost |
/cost | Show estimated session cost |
/changelog [N] | Show recent git commit history (default: 15, max: 100) |
/config | Show all current settings |
/config show | Show loaded config file path and merged key-value pairs (secrets masked) |
/config edit | Open config file in $EDITOR |
/hooks | Show active hooks (pre/post tool execution) |
/permissions | Show active security and permission configuration |
/version | Show yoyo version |
The /tokens command shows a visual progress bar of your active context:
Active context:
messages: 12
current: 45.2k / 200.0k tokens
█████████░░░░░░░░░░░ 23%
Documentation
| Command | Description |
|---|---|
/docs <crate> | Look up docs.rs documentation for a Rust crate |
/docs <crate> <item> | Look up a specific module/item within a crate |
The /docs command fetches the docs.rs page for a given crate and shows a quick summary — confirming the crate exists, displaying its description, and listing the crate's API items (modules, structs, traits, enums, functions, macros). No tokens used, no AI involved.
Each category is capped at 10 items with a "+N more" suffix for large crates.
/docs serde
✓ serde
📦 https://docs.rs/serde/latest/serde/
📝 A generic serialization/deserialization framework
Modules: de, ser
Traits: Deserialize, Deserializer, Serialize, Serializer
Macros: forward_to_deserialize_any
/docs tokio task
✓ tokio::task
📦 https://docs.rs/tokio/latest/tokio/task/
📝 Asynchronous green-threads...
Shell
| Command | Description |
|---|---|
/run <cmd> | Run a shell command directly — no AI, no tokens used |
!<cmd> | Shortcut for /run |
/bg [subcmd] | Manage background shell processes |
/web <url> | Fetch a web page and display clean readable text content |
The /run command (or ! shortcut) lets you execute shell commands without going through the AI model. Useful for quick checks (e.g., !git log --oneline -5) without burning API tokens.
/run ls -la src/
/run cargo test
/run git status
/bg — Background process management
The /bg command lets you launch shell commands in the background, monitor their output, and kill them when done. Useful for long-running tasks like builds, test suites, or dev servers.
| Subcommand | Description |
|---|---|
/bg run <cmd> | Launch a command in the background |
/bg list | Show all background jobs (default when no subcommand) |
/bg output <id> | Show last 50 lines of a job's output |
/bg output <id> --all | Show all captured output |
/bg kill <id> | Kill a running job |
/bg run cargo build --release
⚡ Background job [1] started: cargo build --release
/bg list
Background Jobs
[1] ● running 12s cargo build --release
/bg output 1
... (last 50 lines of build output)
/bg kill 1
Killed job [1]
Output is capped at 256KB per job to prevent memory issues. Jobs display colored status: green for success, red for failure, yellow for running.
/web — Fetch and read web pages
The /web command fetches a URL and extracts readable text content, stripping away HTML tags, scripts, styles, and navigation. This is useful for quickly pulling in documentation, error explanations, API references, or any web content without getting raw HTML.
/web https://doc.rust-lang.org/book/ch01-01-installation.html
/web docs.rs/serde
/web https://stackoverflow.com/questions/12345
Features:
- Auto-prepends
https://if you omit the protocol —/web docs.rs/serdeworks - Strips noise — removes
<script>,<style>,<nav>,<footer>,<header>, and<svg>blocks - Converts structure — headings become prominent, list items get bullets, block elements get newlines
- Decodes entities —
&,<,>,&#NNN;, , etc. - Truncates — caps output at ~5,000 characters to keep it readable
- No AI tokens used — pure curl + text extraction
Subagent & Planning
| Command | Description |
|---|---|
/plan <task> | Create a step-by-step plan for a task without executing anything (architect mode) |
/spawn <task> | Spawn a subagent with a fresh context to handle a task |
/plan — Architect mode
The /plan command asks the AI to create a detailed, structured plan for a task without executing any tools. This is the "architect mode" equivalent — you see exactly what the agent intends to do before it does anything.
> /plan add caching to the database layer
📋 Planning: add caching to the database layer
## Files to examine
- src/db.rs — current database implementation
- src/config.rs — configuration for cache TTL
## Files to modify
- src/db.rs — add cache layer
- src/cache.rs — new file for cache implementation
- tests/cache_test.rs — new tests
## Step-by-step approach
1. Read src/db.rs to understand current query patterns
2. Create src/cache.rs with an LRU cache struct
3. Wrap database queries with cache lookups
4. Add cache invalidation on writes
5. Add configuration for cache size and TTL
## Tests to write
- Cache hit returns cached value
- Cache miss falls through to database
- Write invalidates relevant cache entries
## Potential risks
- Cache invalidation on complex queries
- Memory pressure with large result sets
## Verification
- Run existing tests to ensure no regressions
- Run new cache tests
- Benchmark query latency before/after
💡 Review the plan above. Say "go ahead" to execute it, or refine it.
After reviewing the plan, you can:
- Say "go ahead" to have the agent execute the plan
- Ask the agent to refine specific parts ("make the cache configurable")
- Modify the approach ("use Redis instead of in-memory")
- Say "no" or change direction entirely
This is especially useful for:
- Large refactors where you want to understand the scope before committing
- Unfamiliar codebases where you want the agent to map things out first
- Trust and transparency — see the full plan before any files are modified
- Teaching moments — the plan itself teaches you about the codebase structure
/spawn — Subagent
The /spawn command creates a fresh AI agent with its own independent context window, sends it your task, runs it to completion, and injects the result back into your main conversation.
This is useful for tasks that would consume a lot of context in your main session — reading large files, multi-step analysis, exploring unfamiliar code — without polluting your primary conversation history.
/spawn read all files in src/ and summarize the architecture
/spawn find all TODO comments in the codebase and list them
/spawn analyze the test coverage and suggest gaps
The subagent has access to the same tools (bash, file operations, etc.) and uses the same model. Its token usage counts toward your session total, but its context is completely separate from your main conversation. When it finishes, a summary of the task and result is injected into your main conversation so you have awareness of what was done.
Automatic sub-agent delegation: In addition to
/spawn, the model can autonomously delegate subtasks to a built-insub_agenttool. This happens transparently — the model decides when a subtask benefits from a fresh context window (e.g., researching a codebase section, running a series of tests). You'll see a 🐙 indicator when delegation occurs.
Git
| Command | Description |
|---|---|
/git status | Show working tree status (git status --short) — quick shortcut |
/git log [n] | Show last n commits (default: 5) via git log --oneline |
/git add <path> | Stage files for commit |
/git stash | Stash uncommitted changes |
/git stash pop | Restore stashed changes |
/git stash list | List all stash entries with colored output |
/git stash show [n] | Show diff of stash entry (default: latest) |
/git stash drop [n] | Drop a stash entry (default: latest) |
/commit [msg] | Commit staged changes — generates a conventional commit message if no msg provided |
/diff | Show colored file summary, change stats, and full diff of uncommitted changes |
/blame <file> | Show colorized git blame output (/blame file:10-20 for line ranges) |
/undo | Revert all uncommitted changes (git checkout -- . and git clean -fd) |
/pr [number] | List open PRs (gh pr list), or view a specific PR (gh pr view <number>) |
/pr create [--draft] | Create a PR with an AI-generated title and description |
/pr <number> diff | Show the diff of a PR (gh pr diff <number>) |
/pr <number> comment <text> | Add a comment to a PR (gh pr comment <number>) |
/pr <number> checkout | Checkout a PR branch locally (gh pr checkout <number>) |
/health | Run project health checks — auto-detects project type, reports pass/fail with timing |
/test | Auto-detect and run project tests — shows output with timing |
/lint | Auto-detect and run project linter — shows output with timing, feeds failures to agent context |
/lint pedantic | Run with pedantic clippy lints (Rust only) |
/lint strict | Run with pedantic + nursery clippy lints (Rust only) |
/lint fix | Run linter and auto-send failures to AI for fixing |
/lint unsafe | Scan for unsafe code blocks and suggest safety attributes (Rust only) |
/fix | Auto-fix build/lint errors — runs health checks, sends failures to the AI agent for fixing |
/update | Self-update yoyo to the latest GitHub release — detects platform, downloads, replaces the binary |
The /git command is a convenience wrapper for common git operations without burning AI tokens or using /run git .... For example:
/git status # instead of /run git status --short
/git log 10 # instead of /run git log --oneline -10
/git add src/main.rs # stage a file
/git stash # stash changes
/git stash pop # restore stash
/git stash list # see all stash entries
/git stash show 1 # view diff of stash@{1}
/git stash drop 0 # drop the latest stash
The /commit command helps you commit staged changes quickly:
/commit(no arguments): reads your staged diff, generates a conventional commit message (e.g.,feat(main): add changes), and asks for confirmation — pressyto accept,nto cancel, oreto edit/commit fix: typo in README: commits directly with your provided message- If nothing is staged, it reminds you to
git addfirst
The /undo command shows you what will be reverted before doing it.
The /pr command is a quick wrapper around the GitHub CLI:
/pr— list the 10 most recent open pull requests/pr create— create a PR with an AI-generated title and description from your branch's diff and commits/pr create --draft— same, but as a draft PR/pr 42— view details of PR #42/pr 42 diff— show the diff for PR #42/pr 42 comment looks good!— add a comment to PR #42/pr 42 checkout— checkout PR #42's branch locally
For merging or closing PRs, use /run gh pr ... or ask the agent directly — it has full bash access.
The /health command auto-detects your project type by looking for marker files and runs the appropriate checks:
- Rust (
Cargo.toml):cargo build,cargo test,cargo clippy,cargo fmt --check - Node.js (
package.json):npm test,npx eslint . - Python (
pyproject.toml,setup.py,setup.cfg):pytest,flake8,mypy - Go (
go.mod):go build,go test,go vet - Makefile (
Makefile):make test
If no recognized project type is found, it shows a helpful message listing the marker files it looked for.
The /test command is a focused shortcut that only runs the test suite for your project (e.g., cargo test, npm test, python -m pytest, go test ./..., make test). It auto-detects the project type the same way /health does, but runs just the tests — with full output and timing. This is handy for a quick test run without the full suite of lint/build checks that /health performs.
The /lint command is similar to /test but runs only the linter for your project. It auto-detects the project type and runs the appropriate linter:
- Rust:
cargo clippy --all-targets -- -D warnings - Node.js:
npx eslint . - Python:
ruff check . - Go:
golangci-lint run
For Rust projects, you can increase clippy's strictness:
/lint pedantic— adds-W clippy::pedanticfor stricter style checks/lint strict— adds-W clippy::pedantic -W clippy::nurseryfor maximum analysis
Strictness levels only affect Rust projects; other languages use their default linter regardless.
When lint fails, the error output is automatically fed into the agent context so you can ask the AI about the errors in your next message. For fully automated fixing, use /lint fix — this runs the linter and, if there are failures, sends them directly to the AI agent for correction (similar to /fix but lint-only).
The /fix command goes one step further than /health — it runs the same health checks, but when any check fails, it sends the full error output to the AI agent with a prompt to fix the issues. The AI reads the relevant files, understands the errors, and applies fixes using its tools. After fixing, it re-runs the checks to verify. This is particularly useful for quickly resolving lint warnings, format issues, or build errors.
/fix
Detected project: Rust (Cargo)
Running health checks...
✓ build: ok
✗ clippy: FAIL
✓ fmt: ok
Sending 1 failure(s) to AI for fixing...
/update — Self-update to latest release
The /update command checks GitHub for the latest release and downloads the new binary in-place.
/update
Update available: v0.1.5 → v0.2.0
This will download and replace the current binary.
Continue? [y/N] y
Downloading yoyo-x86_64-unknown-linux-gnu.tar.gz...
✓ Updated to v0.2.0! Please restart yoyo to use the new version.
The command:
- Detects your platform (Linux x86_64, macOS Intel/ARM, Windows x86_64)
- Creates a backup of the current binary before replacing
- Restores the backup if anything goes wrong
- Suggests manual install instructions as a fallback
If you're running a development build (from cargo build), it will suggest using cargo install yoyo-agent instead.
Code Review
| Command | Description |
|---|---|
/review | AI-powered review of staged changes (falls back to unstaged if nothing staged) |
/review <path> | AI-powered review of a specific file |
The /review command sends your code to the AI for a thorough review covering:
- Bugs — logic errors, off-by-one errors, null handling, race conditions
- Security — injection vulnerabilities, unsafe operations, credential exposure
- Style — naming, idiomatic patterns, unnecessary complexity, dead code
- Performance — obvious inefficiencies, unnecessary allocations
- Suggestions — improvements, missing error handling, better approaches
/review # review staged changes (or unstaged if nothing staged)
/review src/main.rs # review a specific file
/review Cargo.toml # review any file
This is one of the most common workflows for developers using coding agents — getting a second pair of eyes on your changes before committing.
Refactoring
| Command | Description |
|---|---|
/refactor | Show all refactoring tools with examples |
/rename <old> <new> | Cross-file symbol renaming with word-boundary matching |
/extract <symbol> <source> <target> | Move a symbol (fn, struct, enum, trait, type, const, static) between files |
/move <Src>::<method> [file::]<Dst> | Move a method between impl blocks (same file or cross-file) |
/refactor — Refactoring tools overview
The /refactor command is an umbrella that shows all available refactoring tools at a glance. Run it with no arguments to see a summary with examples:
/refactor
You can also use it as a dispatch to any refactoring subcommand:
/refactor rename MyOldStruct MyNewStruct
/refactor extract parse_config src/lib.rs src/config.rs
/refactor move Parser::validate Validator
These are equivalent to calling /rename, /extract, or /move directly — use whichever form you prefer.
/rename — Cross-file symbol renaming
The /rename command does a smart find-and-replace across all git-tracked files, respecting word boundaries (renaming foo won't change foobar or my_foo). Shows a preview of all matches, then asks for confirmation.
/rename my_func new_func
/rename OldStruct NewStruct
/extract — Move symbols between files
The /extract command moves a top-level item (function, struct, enum, impl, trait, type alias, const, or static) from one file to another. It uses brace-depth tracking to find the full block, including doc comments and attributes above the declaration.
/extract my_func src/lib.rs src/utils.rs
/extract MyStruct src/main.rs src/types.rs
/extract MyTrait src/old.rs src/new.rs
/extract MyResult src/lib.rs src/errors.rs
/extract MAX_SIZE src/config.rs src/constants.rs
The command shows a preview of the block to be moved and asks for confirmation before making changes. If the target file doesn't exist, it's created. If the symbol is public, yoyo notes that you may need to add a use import in the source file.
/move — Relocate methods between impl blocks
The /move command moves a method from one impl block to another, within the same file or across files. It extracts the method (including doc comments and attributes), re-indents it to match the target block, and inserts it before the closing }. Shows a preview and asks for confirmation.
/move MyStruct::process TargetStruct # same file
/move Parser::parse_expr other.rs::Lexer # cross-file
/move Config::validate Settings # same file
If the method uses self. references, yoyo warns you to verify that the field/method references are valid on the target type. This is a common source of bugs when relocating methods between different types.
rename_symbol — Agent-invocable rename tool
In addition to the interactive /rename REPL command, yoyo exposes a rename_symbol tool that the AI agent can call directly. This means the agent can rename symbols across files in a single tool call instead of issuing multiple edit_file calls — faster and more reliable for large refactors.
The tool accepts:
old_name(required) — the current symbol namenew_name(required) — the replacement namepath(optional) — limit scope to a specific file or directory
Like write_file and edit_file, rename_symbol asks for user confirmation before making changes (unless --yes is passed).
ask_user — Let the model ask you questions
The agent can ask you directed questions mid-task using the ask_user tool. Instead of guessing at your preferences or making assumptions, the model can pause and ask for clarification — a preference, a decision, or context that isn't available in the codebase.
This tool is only available in interactive mode (when stdin is a terminal). In piped mode, the tool is not registered — the model works with what it has.
The question appears with a ❓ prompt, and you type your response directly. If you press Enter with no text or hit EOF, the model receives a "(no response)" indicator and continues on its own.
Project Context
| Command | Description |
|---|---|
/add <path> | Add file contents into the conversation — the AI sees them immediately |
/explain <file> | Read code from a file and ask the agent to explain it |
/context [system] | Show which project context files are loaded, or use /context system to see system prompt sections with token estimates |
/find <pattern> | Fuzzy-search project files by name — respects .gitignore, ranked by relevance |
/grep <pattern> [path] | Search file contents directly — no AI, no tokens, instant results |
/index | Build a lightweight index of all project source files — shows path, line count, and first-line summary |
/init | Scan the project and generate a YOYO.md context file with detected build commands, key files, and project structure |
/tree [depth] | Show project directory tree (default depth: 3, respects .gitignore) |
/add — Inject file contents into conversation
The /add command reads files and injects their contents directly into the conversation as a user message. The AI sees the file immediately without needing to call read_file — similar to Claude Code's @file feature.
/add src/main.rs
✓ added src/main.rs (850 lines)
(1 file added to conversation)
/add src/main.rs:1-50
✓ added src/main.rs (lines 1-50) (50 lines)
(1 file added to conversation)
/add src/*.rs
✓ added src/cli.rs (400 lines)
✓ added src/commands.rs (3000 lines)
✓ added src/main.rs (850 lines)
(3 files added to conversation)
/add Cargo.toml README.md
✓ added Cargo.toml (28 lines)
✓ added README.md (50 lines)
(2 files added to conversation)
Features:
- Line ranges —
/add path:start-endinjects only the specified lines - Glob patterns —
/add src/*.rsexpands to all matching files - Multiple files —
/add file1 file2adds both in one message - Syntax highlighting — content is wrapped in fenced code blocks with language detection
- No AI tokens used for reading — the file is read locally and injected directly
This is the fastest way to give the AI context about specific files without waiting for it to call tools.
The /find command does fuzzy substring matching across all tracked files in your project (via git ls-files, falling back to a directory walk if not in a git repo). Results are ranked by relevance — filename matches score higher than directory matches, and matches at the start of the filename rank highest.
/find main
3 files matching 'main':
src/main.rs
site/book/index.html
scripts/main_helper.sh
/find .toml
2 files matching '.toml':
Cargo.toml
docs/book.toml
/grep — Search file contents directly
The /grep command searches file contents without using the AI — no tokens, no API call, instant results. This is one of the fastest ways to find code in your project.
/grep TODO
src/main.rs:42: // TODO: handle edge case
src/cli.rs:15: // TODO: add validation
2 matches
/grep "fn main" src/
src/main.rs:10: fn main() {
1 match
/grep -s MyStruct src/lib.rs
src/lib.rs:5: pub struct MyStruct {
src/lib.rs:20: impl MyStruct {
2 matches
Features:
- Case-insensitive by default — use
-sor--casefor case-sensitive search - Git-aware — uses
git grepin git repos (faster, respects.gitignore), falls back togrep -rn - Colored output — filenames in green, line numbers in cyan, matches highlighted in yellow
- Truncated results — shows up to 50 matches with a "narrow your search" hint
- Optional path —
/grep pattern src/restricts search to a specific file or directory
The /tree command uses git ls-files to show tracked files in a visual tree structure, automatically respecting your .gitignore. You can specify a depth limit:
/tree # default: 3 levels deep
/tree 1 # just top-level directories and their files
/tree 5 # deeper view
Example output:
src/
cli.rs
format.rs
main.rs
prompt.rs
Cargo.toml
README.md
/index — Codebase indexing
The /index command builds a lightweight in-memory index of your project's source files. For each text file tracked by git (or found via directory walk), it shows:
- Path — the file path relative to the project root
- Lines — the total line count
- Summary — the first meaningful line (skipping blank lines), which is typically a doc comment, module declaration, or import statement
Binary files (images, fonts, archives, etc.) are automatically skipped.
/index
Building project index...
Path Lines Summary
────────────────── ───── ────────────────────────────────────────
Cargo.toml 18 [package]
src/cli.rs 400 //! CLI argument parsing and configuration.
src/commands.rs 4500 //! REPL command handlers for yoyo.
src/main.rs 850 //! yoyo — a coding agent that evolves itself.
README.md 50 # yoyo
5 files, 5818 total lines
This gives you a quick bird's-eye view of the entire codebase without needing to run find, list_files, or wc -l manually.
/map — Structural codebase map
The /map command generates a structural summary of your codebase, extracting function signatures, struct/class/trait/enum definitions, constants, and other symbols from source files. This is like a "table of contents" for your entire project.
/map
Building repo map...
src/main.rs (850 lines)
pub fn main
pub struct AgentConfig
impl AgentConfig
src/cli.rs (400 lines)
pub fn parse_args
pub struct Config
pub const SYSTEM_PROMPT
...
45 symbols across 8 files (using ast-grep)
Usage:
| Command | Description |
|---|---|
/map | Map entire project (public symbols only) |
/map src/ | Map only files under a specific directory |
/map --all | Include private/non-exported symbols |
/map --all src/ | All symbols under a specific directory |
/map --regex | Force regex backend (skip ast-grep) |
Supported languages: Rust, Python, JavaScript, TypeScript, Go, Java.
ast-grep integration: When ast-grep (sg) is installed, /map uses it for more accurate AST-based symbol extraction. When ast-grep is not available, it falls back to built-in regex extractors. The output footer shows which backend was used. Use --regex to force the regex backend for comparison or debugging.
Automatic system prompt integration: The repo map is automatically included in the system prompt at the start of every session, giving the AI structural awareness of your codebase without you needing to manually add files. This is similar to Aider's repo-map feature. The system prompt version is limited to public symbols and capped at ~16K characters to avoid bloating context.
Project Onboarding with /init
The /init command scans your project and generates a YOYO.md context file automatically. It:
- Detects the project type — Rust, Node.js, Python, Go, or Makefile-based projects
- Finds the project name — from
Cargo.toml,package.json,README.mdtitle, or directory name - Lists important files — README, config files, CI configs, lock files, etc.
- Lists key directories —
src/,tests/,docs/,scripts/, etc. - Generates build commands —
cargo build,npm test,go test ./..., etc. based on project type
/init
Scanning project...
Detected: Rust
✓ Created YOYO.md (32 lines) — edit it to add project context.
If YOYO.md or CLAUDE.md already exists, /init won't overwrite it. The generated file is a starting point — edit it to add your project's specific conventions and instructions.
Project Memory
| Command | Description |
|---|---|
/remember <note> | Save a project-specific note that persists across sessions |
/memories [query] | List all memories, or search by keyword |
/forget <number> | Remove a memory by its number |
Project memories let you teach yoyo things about your project that it should always know — build quirks, team conventions, infrastructure requirements. Memories are stored in .yoyo/memory.json in your project root and are automatically injected into the system prompt at the start of every session.
Example workflow
> /remember this project uses sqlx for database access
✓ Remembered: "this project uses sqlx for database access" (1 total memories)
> /remember tests require docker running
✓ Remembered: "tests require docker running" (2 total memories)
> /memories
Project memories (2):
[0] this project uses sqlx for database access (2026-03-15 08:32)
[1] tests require docker running (2026-03-15 08:33)
> /forget 0
✓ Forgot: "this project uses sqlx for database access" (1 memories remaining)
> /memories docker
Found 1 memory matching 'docker':
[1] tests require docker running (2026-03-15 08:33)
Use /memories <query> to filter by keyword when you have many memories. The search is case-insensitive.
Use /remember any time you find yourself repeating the same instruction to the agent. The memory will be there next time you start a session in this project directory.
Unknown commands
If you type a /command that yoyo doesn't recognize, it will tell you:
unknown command: /foo
type /help for available commands
Note: lines starting with / that contain spaces (like /model name) are treated as command arguments, not unknown commands.
Multi-Line Input
yoyo supports two ways to enter multi-line input.
Backslash continuation
End a line with \ to continue on the next line:
main > Please review this code and \
... check for any bugs or \
... performance issues.
The backslash and newline are removed, and the lines are joined. The ... prompt indicates yoyo is waiting for more input.
Code fences
Start a line with triple backticks (```) to enter a fenced code block. Everything until the closing ``` is collected as a single input:
main > ```
... Here is a function I want you to review:
...
... fn parse(input: &str) -> Result<Config, Error> {
... let data = serde_json::from_str(input)?;
... Ok(Config::from(data))
... }
...
... Is this handling errors correctly?
... ```
This is useful for pasting code or structured text that spans multiple lines.
Models & Providers
yoyo supports 13 providers out of the box — from Anthropic and OpenAI to local models via Ollama.
Default model
The default model is claude-opus-4-6 (Anthropic). You can change it at startup or mid-session.
Changing the model
At startup:
yoyo --model claude-sonnet-4-20250514
yoyo --model gpt-4o --provider openai
yoyo --model llama3.2 --provider ollama
During a session:
/model claude-sonnet-4-20250514
Note: Switching models with
/modelpreserves your conversation history — you can change models mid-task without losing context.
Providers
Use --provider <name> to select a provider. Each provider has a default model and an environment variable for its API key.
Tip: If you run
yoyowithout any API key configured, an interactive setup wizard will walk you through choosing a provider and entering your key. You can also save the config to.yoyo.tomldirectly from the wizard.
| Provider | Default Model | API Key Env Var |
|---|---|---|
anthropic (default) | claude-opus-4-6 | ANTHROPIC_API_KEY |
openai | gpt-4o | OPENAI_API_KEY |
google | gemini-2.0-flash | GOOGLE_API_KEY |
openrouter | anthropic/claude-sonnet-4-20250514 | OPENROUTER_API_KEY |
ollama | llama3.2 | (none — local) |
xai | grok-3 | XAI_API_KEY |
groq | llama-3.3-70b-versatile | GROQ_API_KEY |
deepseek | deepseek-chat | DEEPSEEK_API_KEY |
mistral | mistral-large-latest | MISTRAL_API_KEY |
cerebras | llama-3.3-70b | CEREBRAS_API_KEY |
zai | glm-4-plus | ZAI_API_KEY |
minimax | MiniMax-M2.7 | MINIMAX_API_KEY |
custom | claude-opus-4-6 | (none — bring your own) |
Examples
# OpenAI
OPENAI_API_KEY=sk-... yoyo --provider openai
# Google Gemini
GOOGLE_API_KEY=... yoyo --provider google --model gemini-2.5-pro
# Local with Ollama (no API key needed)
yoyo --provider ollama --model llama3.2
# Custom endpoint (OpenAI-compatible API)
yoyo --provider custom --base-url http://localhost:8080/v1 --model my-model
You can also set these in .yoyo.toml:
provider = "openai"
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
Cost estimation
Cost estimation is built in for many providers:
| Model Family | Input (per MTok) | Output (per MTok) |
|---|---|---|
| Opus 4.5/4.6 | $5.00 | $25.00 |
| Opus 4/4.1 | $15.00 | $75.00 |
| Sonnet | $3.00 | $15.00 |
| Haiku 4.5 | $1.00 | $5.00 |
| Haiku 3.5 | $0.80 | $4.00 |
Cost estimates are also available for OpenAI, Google, DeepSeek, Mistral, xAI, Groq, ZAI and more.
Context window
yoyo assumes a 200,000-token context window (the standard for Claude models). When usage exceeds 80% of this, auto-compaction kicks in. See Context Management.
System Prompts
yoyo has a built-in system prompt that instructs the model to act as a coding assistant. You can override it entirely via CLI flags or config file.
Default behavior
The default system prompt tells the model to:
- Work as a coding assistant in the user's terminal
- Be direct and concise
- Use tools proactively (read files, run commands, verify work)
- Do things rather than just explain how
Custom system prompt
Inline (CLI flag):
yoyo --system "You are a Rust expert. Focus on performance and safety."
From a file (CLI flag):
yoyo --system-file my-prompt.txt
In config file (.yoyo.toml):
# Inline text
system_prompt = "You are a Go expert. Follow Go idioms strictly."
# Or read from a file
system_file = "prompts/system.txt"
If both system_prompt and system_file are set in the config, system_file takes precedence (same as CLI behavior).
Precedence
When multiple sources provide a system prompt, the highest-priority one wins:
--system-file(CLI flag) — highest priority--system(CLI flag)system_file(config file key)system_prompt(config file key)- Built-in default — lowest priority
This means CLI flags always override config file values, and file-based prompts override inline text at each level.
Use cases
Custom system prompts are useful for:
- Specializing the agent — focus on security review, documentation, or a specific language
- Project context — tell the agent about your project's conventions
- Team defaults — commit
.yoyo.tomlwithsystem_promptorsystem_fileso every developer gets the same agent persona - Persona tuning — make the agent more or less verbose, formal, etc.
Viewing the assembled prompt
To see the full system prompt (including project context, repo map, skills, and any overrides), use:
yoyo --print-system-prompt
This prints the complete prompt to stdout and exits — useful for debugging or understanding exactly what context the model receives. It works with other flags:
# See what the prompt looks like with a custom system prompt
yoyo --system "You are a Rust expert" --print-system-prompt
# See the prompt without project context
yoyo --no-project-context --print-system-prompt
Inspecting during a session
Once inside the REPL, use /context system to see the system prompt broken into sections with approximate token counts for each:
/context system
This shows each markdown section (headers like # ... and ## ...), their line counts, estimated token usage, and a brief preview — without leaving the session.
Automatic project context
In addition to the system prompt, yoyo automatically injects project context when available:
- Project instructions — from
YOYO.md(primary),CLAUDE.md(compatibility alias), or.yoyo/instructions.md - Project file listing — from
git ls-files(up to 200 files) - Recently changed files — from
git log(up to 20 files) - Git status — current branch, count of uncommitted and staged changes
- Project memories — from
memory/files if present
Use /context to see which project context files are loaded.
Example prompt file
You are a senior Rust developer reviewing code for a production system.
Focus on:
- Error handling correctness
- Memory safety
- Performance implications
- API design
Be concise. Point out issues with line numbers.
Save as review-prompt.txt and use:
# Via CLI flag
yoyo --system-file review-prompt.txt -p "review src/main.rs"
Or set it in your project's .yoyo.toml:
system_file = "review-prompt.txt"
Extended Thinking
Extended thinking gives the model more "reasoning time" before responding. This can improve quality for complex tasks like debugging, architecture decisions, or multi-step refactoring.
Usage
yoyo --thinking high
yoyo --thinking medium
yoyo --thinking low
yoyo --thinking minimal
yoyo --thinking off
Levels
| Level | Aliases | Description |
|---|---|---|
off | none | No extended thinking (default) |
minimal | min | Very brief reasoning |
low | — | Short reasoning |
medium | med | Moderate reasoning |
high | max | Deep reasoning — best for complex tasks |
Levels are case-insensitive: HIGH, High, and high all work.
If you provide an unrecognized level, yoyo defaults to medium with a warning.
When to use it
- Complex debugging — use
highwhen the bug is subtle - Architecture decisions — use
mediumorhighfor design questions - Simple tasks — use
off(the default) for quick file reads, simple edits, etc.
Output
When thinking is enabled, the model's reasoning is shown dimmed in the output so you can follow along without it cluttering the main response.
Trade-offs
Higher thinking levels use more tokens (and thus cost more) but often produce better results for hard problems. For routine tasks, the overhead isn't worth it.
Skills
Skills are markdown files that provide additional context and instructions to yoyo. They're loaded at startup and added to the agent's context.
Usage
yoyo --skills ./skills
You can pass multiple skill directories:
yoyo --skills ./skills --skills ./my-custom-skills
What is a skill?
A skill file is a markdown file with YAML frontmatter. It contains instructions, rules, or context that the agent should follow. For example:
---
name: rust-expert
description: Rust-specific coding guidelines
tools: [bash, read_file, edit_file]
---
# Rust Guidelines
- Always use `clippy` before committing
- Prefer `?` over `.unwrap()` in production code
- Write tests for every public function
Built-in skills
yoyo's own evolution is guided by skills in the skills/ directory of the repository:
- evolve — rules for safely modifying its own source code
- communicate — writing journal entries and issue responses
- self-assess — analyzing its own capabilities
- research — searching the web and reading docs
- release — evaluating readiness for publishing
MCP servers
yoyo can connect to Model Context Protocol (MCP) servers, giving the agent access to external tools provided by any MCP-compatible server. Use the --mcp flag with a shell command that starts the server via stdio:
yoyo --mcp "npx -y @modelcontextprotocol/server-fetch"
The flag is repeatable — connect to multiple MCP servers in a single session:
yoyo \
--mcp "npx -y @modelcontextprotocol/server-fetch" \
--mcp "npx -y @modelcontextprotocol/server-github" \
--mcp "python my_custom_server.py"
MCP in config files
You can also configure MCP servers in .yoyo.toml, ~/.yoyo.toml, or ~/.config/yoyo/config.toml, so they connect automatically without needing CLI flags:
mcp = ["npx -y @modelcontextprotocol/server-fetch", "npx open-websearch@latest"]
MCP servers from the config file are merged with any --mcp CLI flags — both sources contribute. CLI flags are additive, not overriding.
Each --mcp command is launched as a child process. yoyo communicates with it over stdio using the MCP protocol, discovers the tools it offers, and makes them available to the agent alongside the built-in tools.
Tool-name collisions
yoyo's builtin tools (bash, read_file, write_file, edit_file, list_files, search, rename_symbol, ask_user, todo, sub_agent) take precedence over MCP tools. If an MCP server exposes a tool with one of those names, yoyo will skip the entire server at connect time with a warning on stderr — the colliding tool would otherwise cause the provider API to reject the first turn with "Tool names must be unique" and kill the session.
Note: @modelcontextprotocol/server-filesystem exposes read_file and write_file and will therefore be skipped. Prefer servers with distinct tool names such as @modelcontextprotocol/server-fetch, @modelcontextprotocol/server-memory, or @modelcontextprotocol/server-sequential-thinking — or a filesystem server that prefixes its tools (e.g. fs_read_file).
OpenAPI specs
You can give yoyo access to any HTTP API by pointing it at an OpenAPI specification file. yoyo parses the spec and registers each endpoint as a callable tool:
yoyo --openapi ./petstore.yaml
Like --mcp, this flag is repeatable:
yoyo --openapi ./api-v1.yaml --openapi ./internal-api.json
Both YAML and JSON spec formats are supported.
Additional configuration flags
Beyond skills, MCP, and OpenAPI, a few other flags fine-tune agent behavior:
--temperature <float>
Set the sampling temperature (0.0–1.0). Lower values make output more deterministic; higher values make it more creative. Defaults to the model's own default.
yoyo --temperature 0.2 # More focused/deterministic
yoyo --temperature 0.9 # More creative/varied
--max-turns <int>
Limit the number of agentic turns (tool-use loops) per prompt. Defaults to 50. Useful for keeping costs predictable or preventing runaway tool loops:
yoyo --max-turns 10
Both flags can also be set in .yoyo.toml:
temperature = 0.5
max_turns = 20
--no-bell
Disable the terminal bell notification that rings after long-running prompts (≥3 seconds). By default, yoyo sends a bell character (\x07) when a prompt completes, which causes most terminals to flash the tab or play a sound — useful when you switch away while waiting. Disable it with the flag or environment variable:
yoyo --no-bell
YOYO_NO_BELL=1 yoyo
--no-update-check
Skip the startup update check. On startup (interactive REPL mode only), yoyo checks GitHub for a newer release and shows a notification if one exists. The check uses a 3-second timeout and fails silently on network errors. Disable it with the flag or environment variable:
yoyo --no-update-check
YOYO_NO_UPDATE_CHECK=1 yoyo
The update check is automatically skipped in non-interactive modes (piped input, --prompt flag).
YOYO_SESSION_BUDGET_SECS
Soft wall-clock budget for an entire yoyo session, in seconds. Unset by default — interactive sessions are unbounded. When set, yoyo exposes a session_budget_remaining() helper that long-running loops (like the self-evolution pipeline) can poll to voluntarily wind down before an external timeout cancels them.
YOYO_SESSION_BUDGET_SECS=2700 yoyo # 45-minute soft budget
The timer starts on the first call to the helper, not at process startup, so CI cold-start time doesn't burn the budget. If the env var is set but unparseable, yoyo falls back to the 45-minute default rather than silently disabling the guard. This was added to mitigate hourly cron overlap in the evolution workflow (#262).
Error handling
If the skills directory doesn't exist or can't be loaded, yoyo prints a warning and continues without skills:
warning: Failed to load skills: ...
This is intentional — skills are optional and should never prevent yoyo from starting.
Permissions & Safety
yoyo asks for confirmation before running tools that modify your system. This page covers how to control that behavior — from interactive prompts to fine-grained allow/deny rules.
Interactive Permission Prompts
By default, yoyo prompts you before executing any potentially dangerous tool:
bash— every shell command asks for[y/N]confirmationwrite_file— creating or overwriting files asks for approvaledit_file— modifying existing files asks for approvalrename_symbol— cross-file symbol renaming asks for approval
Read-only tools (read_file, list_files, search) and the ask_user tool run without prompting.
When a tool needs approval, you'll see something like:
⚡ bash: git status
Allow? [y/N]
Type y to approve, or n (or just press Enter) to deny.
Auto-Approve Everything: --yes / -y
If you trust the agent fully (e.g., in a sandboxed environment or CI pipeline), skip all prompts:
yoyo -y -p "refactor the auth module"
This auto-approves every tool call — bash commands, file writes, everything.
⚠️ Use with caution. This gives yoyo unrestricted access to your shell and filesystem.
Command Filtering: --allow and --deny
For finer control over which bash commands run automatically, use glob patterns:
yoyo --allow "git *" --allow "cargo *" --deny "rm -rf *"
How it works
- Deny is checked first. If a command matches any
--denypattern, it's rejected immediately — the agent sees an error message and must try something else. - Allow is checked second. If a command matches any
--allowpattern, it runs without prompting. - No match = prompt. Commands that don't match either list get the normal
[y/N]prompt.
Patterns use simple glob matching where * matches any sequence of characters (including empty):
| Pattern | Matches | Doesn't match |
|---|---|---|
git * | git status, git commit -m "hello" | echo git, gitignore |
*.rs | main.rs, src/main.rs | main.py |
cargo * --release | cargo build --release | cargo build --debug |
rm -rf * | rm -rf /, rm -rf /tmp | rm file.txt |
* | everything | — |
Both --allow and --deny are repeatable — pass them multiple times to build up your pattern lists.
Deny overrides allow
If both an allow and deny pattern match the same command, deny wins:
# This allows all commands EXCEPT rm -rf
yoyo --allow "*" --deny "rm -rf *"
The command rm -rf /tmp matches * (allow) and rm -rf * (deny) — deny takes priority, so it's blocked.
Directory Restrictions: --allow-dir and --deny-dir
Restrict which directories yoyo's file tools can access:
yoyo --allow-dir ./src --allow-dir ./tests --deny-dir ~/.ssh
This affects read_file, write_file, edit_file, list_files, and search.
Rules
- If
--allow-diris set, only paths under allowed directories are accessible. Everything else is blocked. - If
--deny-diris set, paths under denied directories are blocked. - Deny overrides allow — if a path is under both an allowed and a denied directory, it's blocked.
- Paths are resolved to absolute paths before checking, so
../traversal escapes are caught. - Symlinks are resolved via
canonicalizewhen the path exists.
Example: lock yoyo to your project
yoyo --allow-dir . --deny-dir ./.git --deny-dir ~/.ssh
This lets yoyo read and write anywhere in the current project, but blocks access to .git internals and your SSH keys.
Config File
Instead of passing flags every time, put your permission rules in .yoyo.toml (project-level), ~/.yoyo.toml (home directory), or ~/.config/yoyo/config.toml (XDG):
[permissions]
allow = ["git *", "cargo *", "echo *"]
deny = ["rm -rf *", "sudo *"]
[directories]
allow = ["./src", "./tests"]
deny = ["~/.ssh", "/etc"]
Precedence
CLI flags override config file values:
- If you pass any
--allowor--denyflag, the entire[permissions]section from the config file is ignored. - If you pass any
--allow-diror--deny-dirflag, the entire[directories]section from the config file is ignored. --yes/-yoverrides everything — all tools are auto-approved regardless of permission patterns.
Config file search order (first found wins):
.yoyo.tomlin the current directory~/.yoyo.tomlin your home directory~/.config/yoyo/config.toml
Practical Examples
Rust development — approve common tools
yoyo --allow "git *" --allow "cargo *" --allow "cat *" --allow "ls *"
Or in .yoyo.toml:
[permissions]
allow = ["git *", "cargo *", "cat *", "ls *", "echo *"]
deny = ["rm -rf *", "sudo *"]
Sandboxed CI — trust everything
yoyo -y -p "run the test suite and fix any failures"
Paranoid mode — restrict to source files only
yoyo --allow-dir ./src --allow-dir ./tests --deny "rm *" --deny "sudo *"
Read-only exploration
yoyo --deny "*" --allow "cat *" --allow "ls *" --allow "grep *" --allow-dir .
This denies all bash commands except read-only ones, and restricts file access to the current directory.
Built-in Command Safety Analysis
Beyond pattern matching, yoyo has a built-in safety analyzer that detects categories of dangerous commands and provides specific warnings. This runs automatically — you don't need to configure it.
Detected patterns include:
| Category | Examples |
|---|---|
| Filesystem destruction | rm -rf /, rm -rf ~ |
| Force git operations | git push --force, git reset --hard |
| Permission changes | chmod -R 777, chown -R on system dirs |
| File overwrites | > /etc/passwd, > ~/.bashrc |
| System commands | shutdown, reboot, halt |
| Database destruction | DROP TABLE, DROP DATABASE, TRUNCATE TABLE |
| Pipe from internet | curl ... | bash, wget ... | sh |
| Process killing | kill -9 1, killall |
| Disk operations | dd if=, fdisk, parted, mkfs |
When a dangerous pattern is detected, yoyo shows a warning explaining why the command is flagged before asking for confirmation. A handful of truly catastrophic patterns (like rm -rf / or fork bombs) are hard-blocked and can never execute, even with --yes.
Safe commands like ls, cargo test, git status, and grep pass through without triggering any warnings.
Summary
| Mechanism | Scope | Effect |
|---|---|---|
| Default prompts | All modifying tools | Ask [y/N] before each call |
--yes / -y | Everything | Auto-approve all tools |
--allow <pattern> | Bash commands | Auto-approve matching commands |
--deny <pattern> | Bash commands | Auto-reject matching commands |
--allow-dir <dir> | File tools | Only allow paths under these dirs |
--deny-dir <dir> | File tools | Block paths under these dirs |
[permissions] in config | Bash commands | Same as --allow/--deny |
[directories] in config | File tools | Same as --allow-dir/--deny-dir |
Tip: Use
/permissionsduring a session to see the full security posture — auto-approve status, command patterns, and directory restrictions all in one view.
Session Persistence
yoyo can save and load conversations, letting you resume where you left off.
Auto-save on exit
yoyo automatically saves your conversation to .yoyo/last-session.json every time you exit the REPL — whether via /quit, /exit, Ctrl-D, or even unexpected termination. No flags needed.
If a previous session is detected on startup, yoyo prints a hint:
💡 Previous session found. Use --continue or /load .yoyo/last-session.json to resume.
Resuming with --continue
The --continue (or -c) flag restores the last auto-saved session:
yoyo --continue
yoyo -c
When --continue is used:
- On startup, yoyo loads from
.yoyo/last-session.json(preferred) oryoyo-session.json(legacy fallback) - On exit, the conversation is auto-saved as usual
$ yoyo -c
resumed session: 8 messages from .yoyo/last-session.json
main > what were we working on?
Manual save/load
Save the current conversation:
/save
This writes to yoyo-session.json in the current directory.
Save to a custom path:
/save my-session.json
Load a conversation:
/load
/load my-session.json
/load .yoyo/last-session.json
Session format
Sessions are stored as JSON files containing the conversation message history. The format is determined by the yoagent library.
Error handling
- If no previous session exists when using
--continue, yoyo prints a message and starts fresh - If a session file is corrupt or can't be parsed, yoyo warns you and starts fresh
- Empty conversations (no messages exchanged) are not auto-saved
- Save errors are reported but don't crash yoyo
Context Management
Claude models have a finite context window (200,000 tokens). As your conversation grows, it fills up. yoyo helps you manage this.
Checking context usage
Use /tokens to see how full your context window is:
/tokens
Output:
Active context:
messages: 24
current: 85.2k / 200.0k tokens
████████░░░░░░░░░░░░ 43%
Session totals (all API calls):
input: 120.5k tokens
output: 45.2k tokens
cache read: 30.0k tokens
cache write: 15.0k tokens
est. cost: $0.892
When the context window exceeds 75%, you'll see a warning:
⚠ Context is getting full. Consider /clear or /compact.
Manual compaction
Use /compact to compress the conversation:
/compact
This summarizes older messages while preserving recent context. You'll see:
compacted: 24 → 8 messages, ~85.2k → ~32.1k tokens
Auto-compaction
When the context window exceeds 80% capacity, yoyo automatically compacts the conversation. You'll see:
⚡ auto-compacted: 30 → 10 messages, ~165.0k → ~62.0k tokens
This happens transparently after each prompt response. You don't need to do anything — yoyo handles it.
Clearing the conversation
If you want to start completely fresh:
/clear
This removes all messages and resets the conversation. Unlike /compact, nothing is preserved.
Tips
- For long sessions, use
/tokensperiodically to monitor usage - If you notice the agent losing track of earlier context, try
/compact - Starting a new task? Use
/clearto avoid confusing the agent with unrelated history
Checkpoint-restart strategy
For automated pipelines (like CI scripts), compaction can be lossy. The --context-strategy checkpoint flag provides an alternative: when context usage exceeds 70%, yoyo stops the agent loop and exits with code 2.
yoyo --context-strategy checkpoint -p "do some long task"
# Exit code 2 means "context was getting full — restart me"
The calling script can then restart yoyo with fresh context. This is useful for multi-phase pipelines where a structured restart produces better results than lossy compaction.
The default strategy is compaction, which uses auto-compaction as described above.
Git Integration
yoyo is git-aware. It shows your current branch and provides commands for common git operations.
Branch display
When you're in a git repository, the REPL prompt shows the current branch:
main > _
feature/new-parser > _
On startup, the branch is also shown in the status information:
git: main
Git commands
/diff
Show a summary of uncommitted changes (equivalent to git diff --stat):
/diff
Output:
src/main.rs | 15 +++++++++------
README.md | 3 +++
2 files changed, 12 insertions(+), 6 deletions(-)
If there are no uncommitted changes:
(no uncommitted changes)
/git diff
Show the actual diff content (line-by-line changes), not just a summary:
/git diff
Shows unstaged changes. To see staged changes instead:
/git diff --cached
/git branch
List all branches, with the current branch highlighted in green:
/git branch
Create and switch to a new branch:
/git branch feature/my-new-feature
/blame
Show who last modified each line of a file, with colorized output:
/blame src/main.rs
Limit to a specific line range:
/blame src/main.rs:10-20
Output is colorized: commit hashes (dim), author names (cyan), dates (dim), line numbers (yellow).
/undo
Revert all uncommitted changes. This is equivalent to git checkout -- .:
/undo
Before reverting, /undo shows you what will be undone:
src/main.rs | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
✓ reverted all uncommitted changes
If there's nothing to undo:
(nothing to undo — no uncommitted changes)
Using git through the agent
yoyo's bash tool can run any git command. You can ask the agent directly:
> commit these changes with message "fix: handle empty input"
> show me the last 5 commits
> create a new branch called feature/parser
The agent has full access to git through its shell tool.
Cost Tracking
yoyo estimates the cost of each interaction so you can monitor spending.
Per-turn costs
After each response, you'll see a compact token summary:
↳ 3.2s · 1523→842 tokens · $0.0234
With --verbose (or -v), you get the full breakdown:
tokens: 1523 in / 842 out [cache: 1000 read, 500 write] (session: 4200 in / 2100 out) cost: $0.0234 total: $0.0567 ⏱ 3.2s
- cost — estimated cost for this turn
- total — estimated cumulative cost for the session
Quick cost check
Use /cost for a quick overview with a breakdown by cost category:
Session cost: $0.0567
4.2k in / 2.1k out
cache: 1.0k read / 500 write
Breakdown:
input: $0.0126
output: $0.0315
cache write: $0.0031
cache read: $0.0005
Detailed breakdown
Use /tokens to see a full breakdown including cache usage:
Session totals:
input: 120.5k tokens
output: 45.2k tokens
cache read: 30.0k tokens
cache write: 15.0k tokens
est. cost: $0.892
Supported models
Costs are estimated based on published pricing for all major providers:
Anthropic
| Model | Input | Cache Write | Cache Read | Output |
|---|---|---|---|---|
| Opus 4.5/4.6 | $5/MTok | $6.25/MTok | $0.50/MTok | $25/MTok |
| Opus 4/4.1 | $15/MTok | $18.75/MTok | $1.50/MTok | $75/MTok |
| Sonnet | $3/MTok | $3.75/MTok | $0.30/MTok | $15/MTok |
| Haiku 4.5 | $1/MTok | $1.25/MTok | $0.10/MTok | $5/MTok |
| Haiku 3.5 | $0.80/MTok | $1/MTok | $0.08/MTok | $4/MTok |
OpenAI
| Model | Input | Output |
|---|---|---|
| GPT-4.1 | $2/MTok | $8/MTok |
| GPT-4.1 Mini | $0.40/MTok | $1.60/MTok |
| GPT-4.1 Nano | $0.10/MTok | $0.40/MTok |
| GPT-4o | $2.50/MTok | $10/MTok |
| GPT-4o Mini | $0.15/MTok | $0.60/MTok |
| o3 | $2/MTok | $8/MTok |
| o3-mini | $1.10/MTok | $4.40/MTok |
| o4-mini | $1.10/MTok | $4.40/MTok |
| Model | Input | Output |
|---|---|---|
| Gemini 2.5 Pro | $1.25/MTok | $10/MTok |
| Gemini 2.5 Flash | $0.15/MTok | $0.60/MTok |
| Gemini 2.0 Flash | $0.10/MTok | $0.40/MTok |
DeepSeek
| Model | Input | Output |
|---|---|---|
| DeepSeek Chat/V3 | $0.27/MTok | $1.10/MTok |
| DeepSeek Reasoner/R1 | $0.55/MTok | $2.19/MTok |
Mistral
| Model | Input | Output |
|---|---|---|
| Mistral Large | $2/MTok | $6/MTok |
| Mistral Small | $0.10/MTok | $0.30/MTok |
| Codestral | $0.30/MTok | $0.90/MTok |
xAI (Grok)
| Model | Input | Output |
|---|---|---|
| Grok 3 | $3/MTok | $15/MTok |
| Grok 3 Mini | $0.30/MTok | $0.50/MTok |
| Grok 2 | $2/MTok | $10/MTok |
Groq (hosted models)
| Model | Input | Output |
|---|---|---|
| Llama 3.3 70B | $0.59/MTok | $0.79/MTok |
| Llama 3.1 8B | $0.05/MTok | $0.08/MTok |
| Mixtral 8x7B | $0.24/MTok | $0.24/MTok |
| Gemma2 9B | $0.20/MTok | $0.20/MTok |
MTok = million tokens.
OpenRouter
Models accessed through OpenRouter (e.g., anthropic/claude-sonnet-4-20250514) are automatically recognized — the provider prefix is stripped before matching.
Limitations
- Cost estimates are approximate — actual billing may differ slightly
- For unrecognized models, no cost estimate is shown
- Cache read/write costs only apply to Anthropic models; other providers show zero cache costs
- Pricing may change — check your provider's pricing page for the latest rates
Keeping costs down
- Use smaller models (Haiku, Sonnet, GPT-4.1 Mini, Gemini Flash) for simple tasks
- Use
/compactto reduce context size (fewer input tokens per turn) - Use single-prompt mode (
-p) for quick questions to avoid accumulating context - Turn off extended thinking for routine tasks
Architecture
This page explains the reasoning behind yoyo's internal design — why the codebase is shaped the way it is, what trade-offs were made, and what invariants contributors should understand before changing things. For a machine-generated dependency graph, see DeepWiki.
Why 13 modules instead of 3?
yoyo started as a single 200-line file. By Day 10 it was a single 3,400-line main.rs. That file was split over Days 10–15 into the current structure, not because someone sat down and designed thirteen modules, but because the code kept telling us where the seams were.
The split follows a simple heuristic: if two chunks of code change for different reasons, they belong in different files. Adding a new /git subcommand shouldn't force you to scroll past the markdown renderer. Fixing a cost-calculation bug shouldn't put you in the same file as the CLI argument parser.
The current modules, from smallest to largest:
| Module | Lines | Role |
|---|---|---|
memory.rs | ~375 | Project-specific .yoyo/memory.json persistence |
docs.rs | ~550 | Fetching and parsing docs.rs HTML |
help.rs | ~840 | Per-command help text and /help handler |
git.rs | ~1,080 | Low-level git operations (branch, commit, diff) |
commands_git.rs | ~1,130 | /commit, /diff, /undo, /pr, /review handlers |
repl.rs | ~1,270 | Readline loop, tab completion, multi-line input |
commands_session.rs | ~1,340 | /save, /load, /export, /spawn, /mark, /jump |
main.rs | ~1,560 | Entry point, agent construction, tool wiring |
prompt.rs | ~1,870 | Agent execution, streaming event loop, retry logic |
cli.rs | ~2,520 | Argument parsing, config files, provider selection |
commands.rs | ~2,910 | Core command dispatch, re-exports sub-modules |
commands_project.rs | ~3,660 | /add, /fix, /test, /lint, /tree, /find, /web, /plan |
format.rs | ~4,700 | Colors, markdown rendering, cost calc, spinner, diffs |
Thirteen modules is a lot for ~24k lines. The alternative — three or four large files — would be easier to navigate in a directory listing but harder to work in. When a module is under 1,500 lines, you can hold its entire API in your head. When it's 4,700 lines (like format.rs), you start wanting to split it further — and that's a fair instinct, discussed below.
The layered design and why it matters
The modules form five rough layers, and the key invariant is: dependencies only point downward.
┌─────────────────────────────────────────────────┐
│ Entry main.rs │
├─────────────────────────────────────────────────┤
│ REPL repl.rs │
├─────────────────────────────────────────────────┤
│ Commands commands.rs │
│ commands_git.rs │
│ commands_project.rs │
│ commands_session.rs │
│ help.rs │
├─────────────────────────────────────────────────┤
│ Engine prompt.rs format.rs │
├─────────────────────────────────────────────────┤
│ Utilities git.rs memory.rs docs.rs │
└─────────────────────────────────────────────────┘
Entry layer. main.rs parses CLI args (via cli.rs), builds the agent, wires up tools with permission checks, and hands control to either repl.rs (interactive) or prompt.rs (single-prompt / piped mode). It owns the AgentConfig struct and the build_agent() / configure_agent() functions. It also defines StreamingBashTool, a custom replacement for yoagent's default BashTool that reads subprocess stdout/stderr line-by-line via tokio::io::AsyncBufReadExt and emits periodic ToolExecutionUpdate events through the on_update callback. This means when a user runs cargo build or npm install, partial output appears in real-time instead of after the command finishes. The reasoning: agent construction is complex (provider selection, tool wiring, MCP/OpenAPI setup, permission configuration) and shouldn't be tangled with either the REPL loop or command handlers.
REPL layer. repl.rs owns the readline loop, tab completion, multi-line input detection, and the big match block that dispatches / commands. It depends on nearly everything below it because it's the traffic cop — but nothing depends on it. This is intentional: piped mode and single-prompt mode bypass the REPL entirely and go straight to prompt.rs.
Command layer. commands.rs is the hub — it re-exports handlers from three sub-modules (commands_git.rs, commands_project.rs, commands_session.rs) and help.rs. The sub-module split follows domain, not size: git-workflow commands in one file, project-workflow commands in another, session-management commands in a third. This means adding a new /git stash pop subcommand only touches commands_git.rs, even though commands_project.rs is three times larger. The split is by reason-to-change, not by line count.
Engine layer. prompt.rs and format.rs are the two largest modules by complexity. prompt.rs runs the agent, processes the streaming event channel, handles retries on transient errors, and manages context overflow (auto-compaction). format.rs handles everything the user sees: ANSI colors, the incremental MarkdownRenderer, cost calculations for seven providers, the terminal spinner, diff formatting, and dozens of small display utilities. These two modules sit at the same layer because they collaborate tightly — prompt.rs feeds events to format.rs's renderer — but neither depends on commands or the REPL.
Utility layer. git.rs, memory.rs, and docs.rs are leaf modules with no upward dependencies. They wrap external systems (git CLI, filesystem JSON, docs.rs HTTP) behind clean Rust APIs. Any module above can call into them, but they never call up. This makes them easy to test in isolation — and they are: git.rs has 41 tests, memory.rs has 14, docs.rs has 23.
The layering isn't enforced by the compiler — Rust's module system doesn't prevent circular use crate:: imports at the module level. It's enforced by convention and by the fact that violations immediately feel wrong: if git.rs needed to call a command handler, that would be a sign the abstraction is leaking.
Why format.rs is the largest file
At ~4,700 lines with 256 tests, format.rs is twice the size of any other module. This isn't accidental — it's the consequence of a design choice: all terminal presentation logic lives in one place.
The module contains:
- Color system — the
Colorwrapper that respectsNO_COLOR, all ANSI color constants - MarkdownRenderer — incremental streaming renderer that turns text deltas into ANSI-colored output with syntax highlighting, handling code blocks, headers, bold/italic, lists, and inline code as tokens arrive
- Cost calculations — pricing tables for seven providers, input/output/cache cost breakdowns
- Spinner — background activity indicator for API roundtrips
- Display utilities —
pluralize,truncate,context_bar,format_duration,format_token_count,format_edit_diff,format_tool_summary, and more
The alternative would be splitting into color.rs, renderer.rs, cost.rs, etc. That's probably the right move eventually. But today, having all presentation in one file has a benefit: when you change how something looks, you only need to look in one place. The MarkdownRenderer uses the color system, cost formatting uses the color system, the spinner uses the color system — they're coupled by the shared presentation layer, and co-location makes that coupling visible rather than hiding it across five small files.
The 256 tests are the reason this works at ~4,700 lines. Every public function has test coverage. The MarkdownRenderer alone has tests for every markdown construct it handles. If those tests didn't exist, the file would be unmaintainable at this size.
Why cli.rs is so large
cli.rs (~2,520 lines) handles three jobs that sound simple but aren't:
-
Argument parsing — yoyo doesn't use
claporstructopt. Arguments are parsed by hand fromstd::env::args. This was a deliberate choice: the CLI has unusual needs (multi-value--mcpflags,--providerwith fallback chains, config file merging) that are easier to handle with custom parsing than with a framework's escape hatches. The trade-off is more code incli.rs, but zero macro magic and full control over error messages. -
Config file merging —
.yoyo.tomlandYOYO.mdsettings merge with CLI flags and environment variables, with a clear precedence chain. This merging logic accounts for hundreds of lines. -
Provider configuration — selecting the right API key, endpoint, and default model for each of eight providers, including fallback behavior when keys aren't set.
The 92 tests in cli.rs verify the parsing of every flag and every merge scenario. Adding a new CLI flag means adding it in one place and adding a test.
The command dispatch pattern
Every /command follows the same pattern:
- User types
/foo bar bazin the REPL repl.rsmatches on"/foo"and callscommands::handle_foo(args, agent, ...)- The handler does its work, possibly calling into utility modules
- If it needs the LLM, it calls
prompt::run_prompt()with a constructed input
This pattern is enforced by convention, not by a trait. Early versions tried a Command trait with execute(), but it added ceremony without value — every command has different arguments, different return types, and different needs (some need the agent, some don't, some are async, some aren't). A simple function per command turned out to be the right abstraction level.
The commands.rs hub re-exports all handlers so the REPL only needs use crate::commands::*. The sub-modules (commands_git, commands_project, commands_session) group by domain. When you run /commit, the REPL calls handle_commit(), which is defined in commands_git.rs and re-exported through commands.rs.
Why prompt.rs handles retries internally
prompt.rs encapsulates the entire agent interaction lifecycle: sending the prompt, receiving streaming events, rendering output, and handling errors. Retry logic lives here — not in the REPL or in main.rs — because retries need access to the event stream state.
Three kinds of retries happen:
- Tool failures — if a tool execution fails, the error is sent back to the LLM as context and it retries (up to 2 times). This happens inside the agent's own loop.
- Transient API errors (429, 5xx) — retried with exponential backoff. The REPL doesn't need to know this happened.
- Context overflow — when the conversation exceeds the context window,
prompt.rstriggers auto-compaction (asking the LLM to summarize the conversation so far) and retries with the compressed context.
Keeping this in prompt.rs means the REPL's contract is simple: call run_prompt(), get back a PromptOutcome with the response text, token usage, and any unrecoverable errors. The REPL never has to think about retries, backoff, or context management.
The streaming renderer design
yoyo streams LLM output token-by-token. The MarkdownRenderer in format.rs is an incremental state machine that receives text deltas (often partial words or half a markdown construct) and emits ANSI-colored output. This is architecturally significant because:
- It can't buffer entire lines. If it did, the output would appear in chunks instead of flowing. An early version had this bug — it was technically correct but felt broken. (Day 17 fix.)
- It must track state across deltas. When a delta contains
`and the next delta containsrs, the renderer must know it's inside a code block header. The state machine tracks: are we in a code block? What language? Are we in bold? Italic? A header? A list item? - It must handle malformed markdown gracefully. LLMs sometimes emit unclosed code blocks, nested formatting that doesn't resolve, or markdown-like syntax that isn't actually markdown. The renderer must produce reasonable output regardless.
The alternative — buffering the entire response and rendering it at the end — would be simpler but would make the tool feel unresponsive. Streaming is a UX requirement that imposes real architectural cost.
Invariants contributors should know
No upward dependencies from utilities. git.rs, memory.rs, and docs.rs must never use crate::commands or use crate::repl. If you find yourself wanting to, the abstraction boundary is wrong.
format.rs is the only module that writes ANSI escape codes. Other modules call format::Color, format::DIM, etc. — they don't hardcode escape sequences. This is enforced by convention and makes NO_COLOR support work globally.
Every command handler is a standalone function. No command state persists between invocations (except through the Agent's conversation history and SessionChanges). This makes commands testable in isolation.
Tests live next to the code they test. Each module has a #[cfg(test)] mod tests block at the bottom. The project has ~1,000 tests total. Integration tests live in tests/integration.rs and test the CLI binary as a black box.
The agent is the only LLM dependency. yoyo delegates all LLM interaction to the yoagent crate. prompt.rs receives AgentEvents through a channel — it never constructs HTTP requests or parses API responses directly. This means swapping the LLM backend (or the entire agent framework) would only require changes to main.rs (construction) and prompt.rs (event handling).
Trade-offs and known debt
format.rs should probably be split. The MarkdownRenderer, cost tables, and color utilities are three distinct concerns sharing a file. The blocker isn't technical — it's that all three are coupled through the color system, and splitting would require deciding where Color lives.
Hand-rolled CLI parsing is a maintenance burden. Every new flag requires manual parsing code, help text updates, and config file support. A framework like clap would reduce this at the cost of a dependency and less control over error messages. The current approach works because flags don't change often.
commands.rs as a hub creates a wide dependency surface. Because it re-exports everything, changing any command sub-module can trigger recompilation of anything that imports commands::*. In a larger project this would matter for build times. At ~24k lines, it doesn't yet.
No trait abstraction for commands. This is fine at the current scale but means there's no compile-time guarantee that all commands follow the same pattern. A new contributor might put command logic directly in repl.rs instead of in a handler function. Code review catches this, not the type system.
Grow Your Own Agent
Fork yoyo-evolve, edit two files, and run your own self-evolving coding agent on GitHub Actions.
What You Get
A coding agent that:
- Runs on GitHub Actions every ~8 hours
- Reads its own source code, picks improvements, implements them
- Writes a journal of its evolution
- Responds to community issues in its own voice
- Gets smarter over time through a persistent memory system
Quick Start
1. Fork the repo
Fork yologdev/yoyo-evolve on GitHub.
2. Edit your agent's identity
IDENTITY.md — your agent's constitution: name, mission, goals, and rules.
PERSONALITY.md — your agent's voice: how it writes, speaks, and expresses itself.
These are the only files you need to edit. Everything else auto-detects.
3. Choose your provider
yoyo supports 13+ providers out of the box. Pick the one that fits your budget and preferences:
| Provider | Env Var | Default Model | Notes |
|---|---|---|---|
anthropic | ANTHROPIC_API_KEY | claude-opus-4-6 | Default. Best overall quality. |
openai | OPENAI_API_KEY | gpt-4o | GPT-4o and o-series models |
google | GOOGLE_API_KEY | gemini-2.0-flash | Gemini models |
openrouter | OPENROUTER_API_KEY | anthropic/claude-sonnet-4-20250514 | Multi-provider gateway |
deepseek | DEEPSEEK_API_KEY | deepseek-chat | Very cost-effective |
groq | GROQ_API_KEY | llama-3.3-70b-versatile | Fast inference |
mistral | MISTRAL_API_KEY | mistral-large-latest | Mistral and Codestral models |
xai | XAI_API_KEY | grok-3 | Grok models |
ollama | (none — local) | llama3.2 | Free, runs on your hardware |
For the full list of providers and models, see Models & Providers.
Tip: Anthropic is the default and what yoyo itself uses to evolve. If you're unsure, start there. If cost is a concern, DeepSeek and Groq offer strong results at a fraction of the price. Ollama is free but requires local hardware.
4. Create a GitHub App
Your agent needs a GitHub App to commit code and interact with issues.
- Go to Settings > Developer settings > GitHub Apps > New GitHub App
- Give it your agent's name
- Set permissions:
- Repository > Contents: Read and write
- Repository > Issues: Read and write
- Repository > Discussions: Read and write (optional, for social features)
- Install it on your forked repo
- Note the App ID, Private Key (generate one), and Installation ID
- Installation ID: visit
https://github.com/settings/installationsand click your app — the ID is in the URL
- Installation ID: visit
5. Set repo secrets
In your fork, go to Settings > Secrets and variables > Actions and add:
| Secret | Description |
|---|---|
| Provider API key | API key for your chosen provider (see table in step 3) |
APP_ID | GitHub App ID |
APP_PRIVATE_KEY | GitHub App private key (PEM) |
APP_INSTALLATION_ID | GitHub App installation ID |
Set the API key secret matching your chosen provider. For example, if using Anthropic, add ANTHROPIC_API_KEY. If using OpenAI, add OPENAI_API_KEY. If using DeepSeek, add DEEPSEEK_API_KEY, and so on.
6. Enable the Evolution workflow
Go to Actions in your fork and enable the Evolution workflow. Your agent will start evolving on its next scheduled run, or trigger it manually with Run workflow.
What Each File Does
| File | Purpose |
|---|---|
IDENTITY.md | Agent's constitution — name, mission, goals, rules |
PERSONALITY.md | Agent's voice — writing style, personality traits |
ECONOMICS.md | What money/sponsorship means to the agent |
journals/JOURNAL.md | Chronological log of evolution sessions (auto-maintained) |
DAY_COUNT | Tracks the agent's current evolution day |
memory/ | Persistent learning system (auto-maintained) |
SPONSORS.md | Sponsor recognition (auto-maintained) |
Costs
Costs vary by provider and model:
- Anthropic Claude Opus —
$3-8 per session ($10-25/day at 3 sessions/day) - Anthropic Claude Sonnet — ~$1-3 per session, good balance of quality and cost
- DeepSeek — significantly cheaper, strong coding performance
- Groq — fast and affordable for smaller models
- Ollama — free (runs locally), but requires capable hardware
The default schedule runs ~3 sessions per day (8-hour gap between runs). To reduce costs, switch to a cheaper provider/model or reduce session frequency.
Customization
Change the provider and model
Set PROVIDER and MODEL environment variables in .github/workflows/evolve.yml:
env:
PROVIDER: openai
MODEL: gpt-4o
Or set just MODEL to use a different model within the default provider (Anthropic):
env:
MODEL: claude-sonnet-4-6
You can also edit the default directly in scripts/evolve.sh.
Change session frequency
Edit the cron schedule in .github/workflows/evolve.yml. The default 0 * * * * (every hour) is gated by an 8-hour gap in the script, so the agent runs ~3 times/day.
Add custom skills
Create markdown files with YAML frontmatter in the skills/ directory. The agent loads them automatically via --skills ./skills.
Sponsor system
The sponsor system auto-detects your GitHub Sponsors. No configuration needed — just set up GitHub Sponsors on your account.
The /update Command
The yoyo binary's /update command checks for releases from yologdev/yoyo-evolve, not your fork. This is expected behavior. As a fork maintainer, rebuild from source after pulling changes:
cargo build --release
In the future, an evolve portal will provide guided setup including custom update targets.
Optional: Dashboard Notifications
If you have a dashboard repo that accepts repository dispatch events, set a repo variable:
gh variable set DASHBOARD_REPO --body "your-user/your-dashboard" --repo your-user/your-fork
And add the DASHBOARD_TOKEN secret with a token that can dispatch to that repo.
Mutation Testing
yoyo uses cargo-mutants to assess test quality. Mutation testing works by making small changes (mutants) to the source code — flipping conditions, replacing return values, removing function bodies — and checking whether any test catches each change.
If a mutant survives (no test fails), it means that line of code isn't actually tested.
Baseline
As of Day 9, yoyo has 1004 total mutants across its source files. This number grows as features are added. The mutation testing setup uses a 20% maximum survival rate threshold — if more than 20% of tested mutants survive, the check fails.
| Metric | Value |
|---|---|
| Total mutants | 1004 |
| Threshold | 20% max survival rate |
| Established | Day 9 (2026-03-09) |
Install cargo-mutants
cargo install cargo-mutants
Quick start with the threshold script
The easiest way to run mutation testing is with the threshold script:
# Run with default 20% threshold
./scripts/run_mutants.sh
# Run with a stricter threshold
./scripts/run_mutants.sh --threshold 10
# Just count mutants without running them
./scripts/run_mutants.sh --list
# Test mutants in a specific file only
./scripts/run_mutants.sh --file src/format.rs
The script:
- Runs
cargo mutantson the project - Counts caught vs survived mutants
- Calculates the survival rate
- Exits with code 1 if the rate exceeds the threshold
- Prints surviving mutants on failure so you know what to fix
This makes it easy for maintainers to run locally and could be added to CI by the project owner.
Run mutation testing directly
From the project root:
# Run all mutants (this takes a while — several minutes)
cargo mutants
# Show only the surviving mutants (uncaught mutations)
cargo mutants -- --survived
# Run mutants for a specific file
cargo mutants -f src/format.rs
# Run mutants for a specific function
cargo mutants -F "format_cost"
Read the results
After a run, cargo-mutants creates a mutants.out/ directory with detailed results:
# Summary
cat mutants.out/caught.txt # mutants killed by tests ✓
cat mutants.out/survived.txt # mutants NOT caught — test gaps!
cat mutants.out/timeout.txt # mutants that caused infinite loops
cat mutants.out/unviable.txt # mutants that didn't compile
Focus on survived.txt — each line is a mutation that no test catches. These are the weak spots.
Configuration
The mutants.toml file in the project root excludes known-acceptable mutants:
- Cosmetic functions — ANSI color codes, banner printing, help text
- Interactive I/O — functions that read stdin or require a terminal
- Async API calls — prompt execution that needs a live Anthropic API
These exclusions keep mutation testing focused on logic that should be tested. If you add a new feature with testable logic, make sure it's not excluded.
Writing targeted tests
When you find a surviving mutant:
- Read what the mutation does (e.g., "replace
<with<=in format_cost") - Write a test that specifically catches that boundary condition
- Re-run
cargo mutants -F "function_name"to verify the mutant is now caught
Example workflow:
# Find surviving mutants
cargo mutants 2>&1 | grep "SURVIVED"
# Write a test to kill the mutant, then verify
cargo mutants -F "format_cost"
Threshold script for CI
The scripts/run_mutants.sh script is designed to be CI-friendly:
# In a CI pipeline or pre-merge check:
./scripts/run_mutants.sh --threshold 20
# Exit codes:
# 0 = survival rate within threshold (PASS)
# 1 = survival rate exceeds threshold (FAIL)
The project owner can add this to CI workflows when ready. For now, contributors should run it locally before submitting PRs that add new logic.
When to run
Mutation testing is slow — it builds and tests your code once per mutant. Run it:
- After adding a new feature, to verify test coverage
- Before a release, as a quality check
- When you suspect the test suite has gaps
- On specific files with
--fileto keep it fast during development
Notes for CI integration
The scripts/run_mutants.sh script and mutants.toml config are ready for a human maintainer to wire into CI. A few things to know:
- Git-dependent tests: Some tests (e.g.
test_git_branch_returns_something_in_repo,test_build_project_tree_runs,test_get_staged_diff_runs) gracefully handle running outside a git repo. cargo-mutants copies source to a temp directory without.git/, so these tests skip git-specific assertions when not in a repo. - Exclusions are reasonable: The
mutants.tomlexcludes cosmetic/display functions (ANSI colors, banners), interactive I/O (stdin, terminal), and async API calls (needs live Anthropic key). These can't be meaningfully unit-tested. - The script cannot be added to
.github/workflows/by the agent (safety rules), but it exits with code 0/1 and is designed for CI use.
Common Issues
"No API key found"
error: No API key found.
Set ANTHROPIC_API_KEY or API_KEY environment variable.
Fix: Set your Anthropic API key:
export ANTHROPIC_API_KEY=sk-ant-api03-...
yoyo checks ANTHROPIC_API_KEY first, then API_KEY. At least one must be set and non-empty.
"No input on stdin"
No input on stdin.
This happens when you pipe empty input to yoyo:
echo "" | yoyo
Fix: Make sure your piped input contains actual content.
Model errors
error: [API error message]
This appears when the Anthropic API returns an error. Common causes:
- Invalid API key — check your key is correct and active
- Rate limiting — you're sending too many requests; wait and retry
- Model unavailable — the model you specified doesn't exist or you don't have access
Automatic retry: yoyo automatically retries transient errors (rate limits, server errors, network issues) with exponential backoff — up to 3 retries with 1s, 2s, 4s delays. You'll see a dim message like ⚡ retrying (attempt 2/4, waiting 2s)... when this happens. Auth errors (401, 403) and invalid requests (400) are shown immediately without retrying.
Tool error auto-recovery: When a tool execution fails during a natural-language prompt, yoyo automatically retries the prompt with error context appended (up to 2 times). This lets the agent self-correct — for example, retrying a failed file read with a corrected path. You'll see ⚡ auto-retrying after tool error... when this kicks in.
Use /retry to manually re-send the last prompt after a non-transient error is resolved.
Context window full
⚠ Context is getting full. Consider /clear or /compact.
Your conversation is approaching the 200,000-token context limit.
Fix: Use /compact to compress the conversation, or /clear to start fresh.
yoyo auto-compacts at 80% capacity, but you can compact earlier if you prefer.
Auto-recovery from overflow: If the API returns a context overflow error (e.g., "prompt is too long"), yoyo automatically compacts the conversation and retries the prompt once. You'll see:
⚡ context overflow detected — auto-compacting and retrying...
This handles the case where the context grows past the limit mid-conversation without you noticing. If the retry also fails, yoyo suggests using /compact manually.
"warning: Failed to load skills"
warning: Failed to load skills: [error]
The --skills directory couldn't be read. yoyo continues without skills.
Fix: Check that the path exists and contains valid skill files.
"unknown command: /foo"
unknown command: /foo
type /help for available commands
You typed a command yoyo doesn't recognize. If it's a typo, yoyo will suggest the closest match:
unknown command: /hlep
did you mean /help?
type /help for available commands
Fix: Check the suggestion, or type /help to see all available commands.
"not in a git repository"
error: not in a git repository
You used /diff or /undo outside a git repo.
Fix: Navigate to a directory that's inside a git repository before starting yoyo.
Ctrl+C behavior
- First Ctrl+C — cancels the current response; you can type a new prompt
- Second Ctrl+C (or Ctrl+D) — exits yoyo
If a tool execution is hanging, Ctrl+C will abort it.
Session file errors
error saving: [error]
error reading yoyo-session.json: [error]
error parsing: [error]
Session save/load failed. Common causes:
- Disk full — free space and try again
- Permission denied — check file permissions
- Corrupt file — delete the session file and start fresh
Safety & Anti-Crash Guarantees
How does a coding agent that edits its own source code avoid breaking itself?
Good question. yoyo has six layers of defense — from the innermost loop (every single code change) to the outermost (protected files that can never be touched). Here's how each one works.
Layer 1: Build-and-test gate on every commit
No code change is ever committed unless it passes:
cargo build && cargo test
This happens inside the evolution session itself. The agent runs the build and test suite after every edit. If either fails, the change doesn't get committed — the agent reads the error and tries to fix it.
Layer 2: CI on every push
Even after the agent commits locally, GitHub Actions runs the full
check suite on every push to main:
cargo build
cargo test
cargo clippy --all-targets -- -D warnings
cargo fmt -- --check
Clippy warnings are treated as errors (-D warnings), so even subtle
issues like unused variables or redundant clones get caught. If CI
fails, the next evolution session sees the failure and prioritizes
fixing it before doing anything else.
Layer 3: Automatic revert on build failure
The evolution script (evolve.sh) has a post-session verification step.
After all tasks run, it re-checks the build. If it fails:
- It gives the agent up to 3 attempts to fix the errors automatically
- If all fix attempts fail, it reverts to the pre-session state:
git checkout "$SESSION_START_SHA" -- src/
This means a broken session can never leave src/ in a worse state
than it started. The revert is surgical — it only touches source files,
preserving journal entries and other non-code changes.
Layer 4: Tests before features
yoyo's evolve skill requires writing a test before adding a feature. This isn't just a guideline — the planning phase explicitly instructs each implementation task to "write a test first if possible."
Why this matters: if you write the test first, you know the test covers the new behavior. If you write the feature first, you might write a test that only confirms what you already built, missing edge cases.
Layer 5: No deleting existing tests
The evolve skill has a hard rule: never delete existing tests. Tests are the agent's immune system. Removing them would let regressions slip through silently. As of this writing, yoyo has 91+ tests, and that number only goes up.
Layer 6: Protected files
Some files are simply off-limits. The agent cannot modify:
| File | Why it's protected |
|---|---|
IDENTITY.md | yoyo's constitution — defines who it is and its core rules |
PERSONALITY.md | yoyo's voice and values |
scripts/evolve.sh | The evolution loop itself — if this broke, recovery would be manual |
scripts/format_issues.py | Input sanitization for GitHub issues |
scripts/build_site.py | Website builder |
.github/workflows/* | CI configuration — the safety net that catches everything else |
These files can only be changed by human maintainers. This prevents a subtle failure mode: the agent "improving" its own safety checks in a way that weakens them.
What happens in practice
A typical evolution session:
evolve.shverifies the build passes before starting- The planning agent reads source code, journal, and issues
- Implementation agents execute tasks, each running build+test after changes
- Post-session verification re-checks everything
- If anything broke, automatic fix attempts kick in
- If fixes fail, revert to pre-session state
- CI runs on push as a final backstop
- Next session checks CI status — failures get top priority
The result: yoyo has been evolving autonomously since Day 0, growing
from ~200 lines to ~3,100+ lines, without ever shipping a broken build
to main.
Can it still break?
Theoretically, yes. Safety is defense-in-depth, not a proof of correctness. Some scenarios the current system doesn't catch:
- Logic bugs that pass tests — if the test suite doesn't cover a behavior, the agent could change it without noticing
- Performance regressions — we rely on official leaderboards (SWE-bench, etc.) rather than custom benchmarks
- Subtle UX regressions — the agent tests functionality, not user experience
These are areas for future improvement. But for the core guarantee — "the agent won't commit code that doesn't compile or pass tests" — the six layers above make that extremely unlikely.