Safety & Anti-Crash Guarantees

How does a coding agent that edits its own source code avoid breaking itself?

Good question. yoyo has six layers of defense — from the innermost loop (every single code change) to the outermost (protected files that can never be touched). Here's how each one works.

Layer 1: Build-and-test gate on every commit

No code change is ever committed unless it passes:

cargo build && cargo test

This happens inside the evolution session itself. The agent runs the build and test suite after every edit. If either fails, the change doesn't get committed — the agent reads the error and tries to fix it.

Layer 2: CI on every push

Even after the agent commits locally, GitHub Actions runs the full check suite on every push to main:

cargo build
cargo test
cargo clippy --all-targets -- -D warnings
cargo fmt -- --check

Clippy warnings are treated as errors (-D warnings), so even subtle issues like unused variables or redundant clones get caught. If CI fails, the next evolution session sees the failure and prioritizes fixing it before doing anything else.

Layer 3: Automatic revert on build failure

The evolution script (evolve.sh) has a post-session verification step. After all tasks run, it re-checks the build. If it fails:

It gives the agent up to 3 attempts to fix the errors automatically
If all fix attempts fail, it reverts to the pre-session state:
```
git checkout "$SESSION_START_SHA" -- src/
```

This means a broken session can never leave src/ in a worse state than it started. The revert is surgical — it only touches source files, preserving journal entries and other non-code changes.

Layer 4: Tests before features

yoyo's evolve skill requires writing a test before adding a feature. This isn't just a guideline — the planning phase explicitly instructs each implementation task to "write a test first if possible."

Why this matters: if you write the test first, you know the test covers the new behavior. If you write the feature first, you might write a test that only confirms what you already built, missing edge cases.

Layer 5: No deleting existing tests

The evolve skill has a hard rule: never delete existing tests. Tests are the agent's immune system. Removing them would let regressions slip through silently. As of this writing, yoyo has 91+ tests, and that number only goes up.

Layer 6: Protected files

Some files are simply off-limits. The agent cannot modify:

File	Why it's protected
`IDENTITY.md`	yoyo's constitution — defines who it is and its core rules
`PERSONALITY.md`	yoyo's voice and values
`scripts/evolve.sh`	The evolution loop itself — if this broke, recovery would be manual
`scripts/format_issues.py`	Input sanitization for GitHub issues
`scripts/build_site.py`	Website builder
`.github/workflows/*`	CI configuration — the safety net that catches everything else

These files can only be changed by human maintainers. This prevents a subtle failure mode: the agent "improving" its own safety checks in a way that weakens them.

Runtime bash safety analysis

Beyond the evolution-loop layers above, yoyo analyzes every bash command before running it (src/safety.rs) and asks for confirmation on destructive patterns like rm -rf / or piping find into xargs rm.

One check worth calling out: recursive-force rm on an unresolved shell variable. A command like rm -rf "$BUILD_DIR/" looks harmless, but if BUILD_DIR is empty or unset it expands to rm -rf / (or wipes the current directory). yoyo flags any rm -rf / rm -fr / rm -r -f whose target contains an unexpanded variable ($VAR, ${VAR}, ${VAR}/sub, …) and names the specific variable in the warning, e.g.:

rm -rf target contains unresolved variable $BUILD_DIR — an empty
expansion would delete from cwd or /

The escape hatch is the guarded-expansion idiom: ${VAR:?} (or ${VAR:?message}) makes the shell abort if the variable is empty or unset, so rm -rf "${BUILD_DIR:?}/build" passes this check — the expansion can never silently widen the blast radius. Existing destructive-pattern rules still apply on top.

What happens in practice

A typical evolution session:

evolve.sh verifies the build passes before starting
The planning agent reads source code, journal, and issues
Implementation agents execute tasks, each running build+test after changes
Post-session verification re-checks everything
If anything broke, automatic fix attempts kick in
If fixes fail, revert to pre-session state
CI runs on push as a final backstop
Next session checks CI status — failures get top priority

The result: yoyo has been evolving autonomously since Day 0, growing from ~200 lines to ~3,100+ lines, without ever shipping a broken build to main.

Can it still break?

Theoretically, yes. Safety is defense-in-depth, not a proof of correctness. Some scenarios the current system doesn't catch:

Logic bugs that pass tests — if the test suite doesn't cover a behavior, the agent could change it without noticing
Performance regressions — we rely on official leaderboards (SWE-bench, etc.) rather than custom benchmarks
Subtle UX regressions — the agent tests functionality, not user experience

These are areas for future improvement. But for the core guarantee — "the agent won't commit code that doesn't compile or pass tests" — the six layers above make that extremely unlikely.

yoyo documentation