EVO × Claude Code

EVO ports autoresearch onto Claude Code dynamic workflows

If your team builds dynamic agent workflows on Claude Code, you already tune them by hand — rewriting prompts, reordering subagents, swapping tools, then eyeballing whether it got better. EVO turns that into autoresearch: agents propose changes to your Claude Code workflows, prove which ones beat your baseline, and keep only what's measurably better — gated by your safety checks, with a receipt for every change.

Get Early Access View EVO on GitHub

What EVO gives your Claude Code workflows

Claude Code makes it easy to build dynamic, multi-step agent workflows. It does not tell you which version is actually better. EVO runs that experiment for you — porting autoresearch onto the workflows you already ship — so prompt, tool, and orchestration changes compete on evidence instead of intuition.

Autoresearch on the workflows you already run

EVO instruments your existing Claude Code agents, prompts, and subagent graphs as the search space — no rewrite, no separate harness. The workflow you ship is the thing being optimized.

Only measurable wins land

A proposed change to a prompt or orchestration step only survives if it beats your baseline on your eval and clears your regression checks. A faster wrong answer never ships.

A receipt for every change

Each kept improvement comes with the experiment, the score delta, and the discarded alternatives — so you can hand a prospect a clear before/after, not a vibe.

How EVO runs autoresearch on Claude Code

The same loop your team runs by hand on a Claude Code workflow — automated, parallelized, and safety-gated.

1
Point EVO at your workflow
Connect the Claude Code workflow you want to improve and the eval that defines "better" — task success, latency, or token cost. EVO maps prompts, tools, and subagent steps as the search space.
2
Propose a change
An agent reads the workflow, prior runs, and discarded ideas, then makes one concrete change — a tightened prompt, a reordered subagent, a swapped tool — to test against the baseline.
3
Run it in parallel
Each candidate runs in its own isolated git worktree against your eval, scored independently so dynamic workflows never collide or corrupt your main branch.
4
Gate for safety
Your regression tests and invariants run as gates. A change that breaks a tool call or a guardrail is discarded — even if its score went up.
5
Keep or discard
Only changes that beat the baseline and clear every gate survive. The rest are thrown away, with the reasoning recorded for the next round.
6
Raise the baseline and repeat
A kept win becomes the new baseline, and the loop forks again — compounding verified improvements across your Claude Code workflows continuously.

Hand-tuning a Claude Code workflow vs. autoresearch on it

Criterion	Hand-tuning in Claude Code	EVO autoresearch
How changes are tested	Edit a prompt or step, rerun, eyeball the difference	Every change runs against your eval and is scored against the baseline
Throughput	One workflow variant at a time, between other work	Parallel subagents, each in its own isolated git worktree
Search strategy	Greedy tweaks on whatever you tried last	Tree search — multiple directions fork from any committed win
Safety	You remember to recheck guardrails — usually	Gates discard any change that breaks a test, even if the score rises
Scope	You tune the one prompt or step you have time for	Compounds across prompts, tools, subagent graphs, and configs
Evidence to share	"It feels better" — no audit trail	A receipt per change: experiment, score delta, discarded alternatives

Ship improvements.
Not guesses.

Join teams turning every change into
measurable, verified progress.

EVO ports autoresearch onto Claude Code dynamic workflows

What EVO gives your Claude Code workflows

Autoresearch on the workflows you already run

Only measurable wins land

A receipt for every change

How EVO runs autoresearch on Claude Code

Point EVO at your workflow

Propose a change

Run it in parallel

Gate for safety

Keep or discard

Raise the baseline and repeat

Hand-tuning a Claude Code workflow vs. autoresearch on it

Further reading

What is autoresearch?

Autoresearch as infrastructure

Ship improvements.
Not guesses.

EVO ports autoresearch onto Claude Code dynamic workflows

What EVO gives your Claude Code workflows

Autoresearch on the workflows you already run

Only measurable wins land

A receipt for every change

How EVO runs autoresearch on Claude Code

Point EVO at your workflow

Propose a change

Run it in parallel

Gate for safety

Keep or discard

Raise the baseline and repeat

Hand-tuning a Claude Code workflow vs. autoresearch on it

Further reading

What is autoresearch?→

Autoresearch as infrastructure→

Ship improvements.Not guesses.

What is autoresearch?

Autoresearch as infrastructure

Ship improvements.
Not guesses.