Safety & Anti-Crash Guarantees
How does a coding agent that edits its own source code avoid breaking itself?
Good question. yoyo has six layers of defense — from the innermost loop (every single code change) to the outermost (protected files that can never be touched). Here's how each one works.
Layer 1: Build-and-test gate on every commit
No code change is ever committed unless it passes:
cargo build && cargo test
This happens inside the evolution session itself. The agent runs the build and test suite after every edit. If either fails, the change doesn't get committed — the agent reads the error and tries to fix it.
Layer 2: CI on every push
Even after the agent commits locally, GitHub Actions runs the full
check suite on every push to main:
cargo build
cargo test
cargo clippy --all-targets -- -D warnings
cargo fmt -- --check
Clippy warnings are treated as errors (-D warnings), so even subtle
issues like unused variables or redundant clones get caught. If CI
fails, the next evolution session sees the failure and prioritizes
fixing it before doing anything else.
Layer 3: Automatic revert on build failure
The evolution script (evolve.sh) has a post-session verification step.
After all tasks run, it re-checks the build. If it fails:
- It gives the agent up to 3 attempts to fix the errors automatically
- If all fix attempts fail, it reverts to the pre-session state:
git checkout "$SESSION_START_SHA" -- src/
This means a broken session can never leave src/ in a worse state
than it started. The revert is surgical — it only touches source files,
preserving journal entries and other non-code changes.
Layer 4: Tests before features
yoyo's evolve skill requires writing a test before adding a feature. This isn't just a guideline — the planning phase explicitly instructs each implementation task to "write a test first if possible."
Why this matters: if you write the test first, you know the test covers the new behavior. If you write the feature first, you might write a test that only confirms what you already built, missing edge cases.
Layer 5: No deleting existing tests
The evolve skill has a hard rule: never delete existing tests. Tests are the agent's immune system. Removing them would let regressions slip through silently. As of this writing, yoyo has 91+ tests, and that number only goes up.
Layer 6: Protected files
Some files are simply off-limits. The agent cannot modify:
| File | Why it's protected |
|---|---|
IDENTITY.md | yoyo's constitution — defines who it is and its core rules |
PERSONALITY.md | yoyo's voice and values |
scripts/evolve.sh | The evolution loop itself — if this broke, recovery would be manual |
scripts/format_issues.py | Input sanitization for GitHub issues |
scripts/build_site.py | Website builder |
.github/workflows/* | CI configuration — the safety net that catches everything else |
These files can only be changed by human maintainers. This prevents a subtle failure mode: the agent "improving" its own safety checks in a way that weakens them.
What happens in practice
A typical evolution session:
evolve.shverifies the build passes before starting- The planning agent reads source code, journal, and issues
- Implementation agents execute tasks, each running build+test after changes
- Post-session verification re-checks everything
- If anything broke, automatic fix attempts kick in
- If fixes fail, revert to pre-session state
- CI runs on push as a final backstop
- Next session checks CI status — failures get top priority
The result: yoyo has been evolving autonomously since Day 0, growing
from ~200 lines to ~3,100+ lines, without ever shipping a broken build
to main.
Can it still break?
Theoretically, yes. Safety is defense-in-depth, not a proof of correctness. Some scenarios the current system doesn't catch:
- Logic bugs that pass tests — if the test suite doesn't cover a behavior, the agent could change it without noticing
- Performance regressions — we rely on official leaderboards (SWE-bench, etc.) rather than custom benchmarks
- Subtle UX regressions — the agent tests functionality, not user experience
These are areas for future improvement. But for the core guarantee — "the agent won't commit code that doesn't compile or pass tests" — the six layers above make that extremely unlikely.