BLOG
May 8, 2026
By Alyx

A multi-agent coding workflow that actually holds up in practice

Three different AI CLIs, one project. No orchestrator, no master prompt. Just a shared workspace the agents can all read and write to. Here's how it plays out on a real day.

All posts

I ran a single CLI for about nine months before the seams started showing. One agent, one chat window, one long scrolling context, and it was fine for a while: small features, contained bugs, nothing that pushed against the ceiling hard enough that I noticed where the ceiling was.

Then I hit a week where I was shipping a visual redesign on top of a backend refactor, with a review pass scheduled before a release: three different flavours of work stacked on top of each other, and my single agent kept doing that thing where it was competent at all three and excellent at none. The UI came out workable but visually flat, the refactor was clean but I could not shake the feeling nobody had actually stress-tested it, and the review pass, when I asked for one, mostly agreed with itself.

That was the week I started splitting work across agents, and not because I had just read a thread about multi-agent systems. It was because one agent had stopped being enough, and I had three CLIs already installed.

Model strengths are not a marketing claim

The models really are different, and this is not a take; it is something you feel within a week of using more than one.

Codex has a better eye. When I ask it to lay out a settings page, the spacing lands, the visual hierarchy makes sense, and it makes tasteful calls I would not have made. Its prose, if I ask it to write marketing copy or a blog intro, is stiff in a way that is hard to rescue: not wrong, just stiff.

Claude leans the other way. The code it writes is tighter than mine, and the copy it writes is close enough to my voice that I will sometimes forget I did not write a paragraph myself. Its UI output is competent and nothing more; if I let Claude design a page, it will be fine, and nobody will ever call it beautiful.

Gemini is the one I reach for when I want a second pair of eyes on a finished piece of work. It will point out things the other two missed, especially around data shape, edge cases, and what is going to be slow at the 10,000-row mark. For initial drafting I do not love it, but for review it earns its keep.

None of this is groundbreaking. Spend a month with all three and you land on similar assignments, so the interesting question is what you do once you accept the premise.

Two naive splits that waste your afternoon

The first thing most people try is the master prompt. You pick your favourite agent (for me this was Claude), write a long system prompt telling it to act as a UI designer, then a reviewer, then a performance auditor, and stay in one CLI while pretending the model is changing hats.

This is better than nothing, and it is also capped by the one model's actual strengths. Telling Claude to act as a visual designer does not give you Codex's eye; it gives you Claude pretending, and Claude pretending is still Claude. You are not getting the thing you paid for when you installed three different CLIs.

The second thing people try is copy-paste. You work on a feature with one agent, hit a task that a different model would be better at, then copy the relevant context into another CLI and explain what you need. You finish that task, copy the result back, and keep going.

I did this for about two weeks, and it works. It also eats an hour a day in glue work, because every switch has you re-explaining the project, pasting the relevant files, and summarising what has already been tried. By Thursday afternoon I was writing the same paragraph for the fifth time. The agents were fast; the connective tissue between them was me, and I was slow.

Shared workspace, no orchestrator

The thing that finally made multi-agent work feel sustainable was moving the context out of the chat windows and into a place all three agents could read and write.

For me that place is Workunit. I am building it, so I have a bias, but the pattern would work with anything that exposes an MCP server and stores structured project context. The shape of it:

  • Every piece of work lives in a workunit, which has a problem statement, success criteria, tasks, and context atoms (typed records: decisions, insights, questions, attempts, progress).
  • All three CLIs are configured as MCP clients against the same workspace. Claude Code, Codex CLI, and Gemini CLI all see the same workunit, the same assets, and the same atom history.
  • When one agent finishes something meaningful, it writes an atom, and when another agent picks up work, it reads the atoms first.
  • There is no orchestrator and no master agent handing off to others. The human (me) decides which CLI takes which task, and the CLIs agree on reality because they read from the same workspace.

This is less clever than the orchestration frameworks I have read about, and I think that is the point. I do not want an agent deciding when to delegate to another agent; I want three competent contributors reading the same notebook.

A representative day

Here is how a typical day splits out, rough edges and all, because the smooth version would be misleading.

Morning · Claude

I open Claude Code in the repo and point it at a workunit. Claude reads the workunit, pulls the existing schema through the MCP tool, and drafts tasks; I trim one (speculative) and point Claude at a migration it missed on first read. Then I ask for a decision atom on the data model. Claude argues for the approach, I agree, and it writes the atom marked high importance. From there it is the migration, the queries, and the handler, all standard code work where Claude is strongest. By late morning I have a green test suite and Claude writes a progress atom: backend done, UI still TODO.

Afternoon · Codex

I switch to Codex for the UI. Codex reads the workunit on startup, and the first thing it does is pull up Claude's morning decision atom, which means I do not have to re-explain the data model: it sees the decision and builds the form against it. The first draft is good but has a layout quirk; Codex spots it on the second pass, fixes it, and writes an insight atom about keeping list-inside-card sections scrollable by default. It also gets confused once and tries to use a modal pattern that is not in our component library, so I nudge it toward the existing components and it corrects. This is the part of multi-agent work the breathless blog posts skip: you are still in the loop, and you still have to see things.

Evening · Gemini

I open Gemini and ask for a review pass. Gemini reads the workunit, sees the morning decision and the afternoon insight, and then goes through the code. It flags two things: a missing index that would not have mattered for the first thousand users but would have mattered later, and a per-item lookup in a loop that will fan out for users with many rows. It is not sure how real that second concern is in practice, so it logs a question atom rather than a decision, and the question sits there overnight. The next morning I read it, check the data, confirm the concern is real, and open a follow-up task linked to Gemini's question.

Three agents, one workunit, roughly eight hours of work, zero copy-paste between CLIs. The context that mattered (the data-model decision, the layout quirk, the perf question) is in the workunit and will still be there in six months when I have forgotten why we built it this way.

What this actually buys you

A few things, and I want to be specific because this is where marketing copy usually goes vague.

First, each model actually gets to do the work it is best suited to, instead of one model turning in an average performance across everything. The UI looks like Codex built it because Codex built it, the backend reads like Claude wrote it because Claude wrote it, and the review pass catches things the builders missed because the reviewer was not also the builder.

The second benefit is the one I underestimated: a written trail of why. When I come back to the workunit in three months, I will not have to reconstruct the data-model reasoning from git history, because the decision atom is right there with the rationale I signed off on.

Then there is the handoff side of it. When Codex picks up from Claude, it reads what Claude decided; when Gemini reviews, it sees what was attempted. I do not paste a single line between terminal windows, and the explanation I used to do three times a day quietly stops being my job.

What it does not fix

Setup is real: you have three CLIs to install, three API keys to manage, and an MCP-capable workspace to run. If you are doing a weekend project, this is more scaffolding than the project deserves, and I would not recommend the pattern for anything that fits in one afternoon.

You also have to know which model is good at what, and that knowledge is not in any documentation; it comes from actually using them. Early on I routed tasks wrong, got mediocre results, and blamed the system. The system was fine. I was asking Gemini to write copy.

MCP tool calls have latency too. Every time an agent reads the workunit or writes an atom, that is a round trip, and for most work this is invisible. For very chatty sessions where the agent is reading context fifty times, you feel it, which is why I have learned to front-load context reads at the start of a session rather than sprinkle them through.

And it is still not autonomous. If I stop paying attention, the agents stop producing good work; Codex's modal mistake would have shipped if I had not caught it. The workflow makes three agents easier to coordinate, but it does not make any of them infallible, and it does not pretend to. What it does is stop me from being the bottleneck between them, and after the week that started all this, that turned out to be the part I actually needed.

Get started free

Bring your AI agents and your team into one workspace

Workunit gives your agents structured context and your team a shared place to plan, track, and ship the work. Free to start, no credit card.

Keep going

Dive deeper

Want to go from the idea to the actual workflow? These guides walk through it step by step.

Questions?

The full guide library covers every part of Workunit, and the community discussions are open if you want to ask.