BLOG
May 29, 2026
By Alyx

How to run AI coding agents on your repo and stay in the loop

Run AI coding agents on your GitHub repo on cloud VMs — watch them work in a live terminal, steer them mid-run, and keep what they learned.

All posts

Handing a coding task to an AI agent is easy. Staying in the loop while it works is the hard part. The pattern I kept hitting was always the same: I'd kick off some agent against my repo, walk away, and come back to a diff I had to reverse-engineer. The agent had made a dozen decisions I never saw, hit dead ends I had no record of, and by the time I was reviewing the output, the reasoning that produced it was gone.

This post is about the opposite of that. How to run an AI coding agent on your GitHub repo and actually watch it work, steer it when it's heading somewhere wrong, and keep what it learned after the run ends. The feature in Workunit that does this is called Cloud Execution. But I want to talk about the problem first, because the problem is the reason it works the way it does.

The black box problem

Most ways to run an agent on a repo share the same shape. You describe a task, the agent disappears into a container somewhere, and some minutes later you get a result. That's fine when it works. The trouble is the three failure modes that show up the rest of the time, and they all come from the same root: you can't see inside the box.

1

You can't trust what you can't see

A diff at the end tells you what changed, not why. If the agent misread the task or took a wrong turn early, you find out at review time, when it's most expensive to unwind.

2

The reasoning evaporates

Every decision the agent made, every approach it tried and abandoned, lives in a transcript that's gone the moment the run ends. The next run starts from zero.

3

You can't course-correct

If you spot the agent going sideways ten minutes in, your only move is to kill it and start over. There's no way to lean in and say "not that, this."

Each of these is solvable. But they're only solvable if the agent runs somewhere you can watch it, talk to it, and have it write back to a place that outlives the run. That's the whole design.

Running the agent on your actual repo

The starting point is simple: the agent works against your real codebase, not a paste of it. Each run spins up an ephemeral cloud VM, clones your GitHub repo into it, and lets the agent read the files, grep through the tree, and run commands the way a developer would. When the run is over, the VM is gone.

There are two modes, and they map onto two different moments in a task. Explore mode points the agent at the repo to analyze it and suggest tasks based on your problem statement, with file references and effort estimates you accept or discard. Implement mode takes the tasks and has the agent write the code, run the tests, push a feature branch, and open a pull request you review through your normal GitHub workflow.

Explore

The agent reads your codebase and proposes specific, actionable tasks against your problem statement. You review each suggestion and accept the ones worth doing. Nothing lands in your workunit you didn't approve.

Implement

The agent works your accepted tasks: writes code, runs tests, pushes a branch, and opens a PR with a diff summary and test results. You review and merge it like any other contribution.

Watching the agent work, not just the diff

This is the part that fixes the trust problem. While a run is going, its full transcript streams into a live terminal on the execution page, built on xterm.js. You're not waiting for a final report. You're watching the agent think and act in real time.

Every shell command the agent runs shows up as a line in the terminal, followed by its actual output. The agent's reasoning chunks render dimmed, so you can tell its internal monologue from its replies, and the cursor follows along as new tokens arrive. When the agent greps for a function, you see the grep. When it reads a file before editing it, you see the read. The decisions stop being invisible.

A few things make this usable on real runs rather than toy ones. An auto-scroll toggle pins the view to the tail and turns itself off the moment you scroll up to inspect older output, then turns back on when you scroll to the bottom — follow mode, the way a tail or an IRC client does it. Long runs load tail-first: open an execution that produced fifty thousand lines and the last batch renders instantly, backfilling older lines as you scroll up. And you can download the full log as an NDJSON file whenever you want the record off-platform.

Steering it mid-run

Watching is good. Watching and being unable to do anything about what you see is frustrating. So the terminal has a second mode: interactive runs. Pick "Interactive" on the explore or implement launch form, and the terminal pane gets a chat input at the bottom.

Now when you spot the agent heading down the wrong path, you type a message and hit Enter to redirect it mid-session. "Actually, do X instead." "Don't touch that file." "Use the existing helper rather than writing a new one." The agent takes the steer and keeps going. A pulsing cursor next to the input tells you whether the agent is busy thinking or waiting on your next message, and typing /exit ends the session cleanly. It's the difference between supervising an agent and just launching one.

Keeping what the agent learned

The third problem — the reasoning evaporating when the run ends — is the one I care about most, because it's the one a transcript alone doesn't solve. A downloadable log is a record of what happened, but it's not something the next agent reads. For that, the agent needs to write back into the workspace itself.

Cloud Execution agents get live access to the workunit they're running against, through the same MCP tools my editor uses. They read the tasks, success criteria, context atoms, and assets on demand instead of guessing from a single up-front prompt. And, crucially, they write back. Here's what that looks like in practice:

Moves tasks as it works them

The agent flips tasks through todo, in_progress, and done as it actually touches them. You watch progress on the task board instead of only seeing the final diff, and when the run ends the tasks are already updated.

Saves context atoms as it goes

Decisions made, gotchas hit, approaches that didn't pan out — the agent records them as it works, with proper supersedes-links when an earlier note turns out wrong. The trail-of-thought survives the run.

Creates follow-up tasks

When the agent finds work the original launch didn't anticipate, it adds new tasks to the workunit directly instead of burying them at the bottom of a diff summary. They show up on the board immediately.

Tightening the prompt helped here too. The agent fetches what it needs from the workunit rather than receiving a fifty-task context dump up front, so it reasons about a smaller, fresher context — and the live transcript is easier to read. Interactive runs get the same write-back access: when you steer the agent mid-session, it can update task statuses, save the new direction as an atom, and create follow-ups without leaving the chat.

Why this matters across runs

A run that writes back isn't just doing the task. It's leaving the workunit in a better state for whoever picks it up next — you tomorrow, a teammate, or a different agent. The diff is the output. The atoms and task updates are the institutional memory that makes the next run cheaper.

How you actually run one

The mechanics are deliberately boring, which is what you want from infrastructure. Cloud Execution runs on a bring-your-own-key model: you supply an API key for the VM provider and connect an OpenRouter account for the models, both configured once per organization. Workunit orchestrates the run; you pay the providers directly for what you use, and you pick the model from a dropdown that shows per-token pricing next to each option.

For private repos and for Implement mode, you install the Workunit GitHub App on the repositories you choose. It uses short-lived installation tokens that expire within the hour and are never stored, and you can scope it to specific repos or revoke it from GitHub at any time. From there, a run is a few clicks from a workunit:

1. Open a workunit with a clear problem statement 2. Open the Cloud Execution panel → pick Explore or Implement 3. Confirm the repo URL + branch, choose a model, pick Interactive if you want to steer 4. Start the run → watch the live terminal, jump in when you need to 5. Review the suggested tasks (Explore) or the PR + diff summary (Implement)

The recommended path is Explore first, then Implement. Explore is the cheaper run — a few minutes of VM time to generate and refine a task list. A well-defined task list then makes Implement more focused and less expensive. Treat the resulting PR like any other contribution: review the diff, run your CI, and merge on your own terms. The agent did the work in the open; you still own the decision to ship it.

Staying in the loop is the whole point

I don't think the goal of running agents on your repo is to remove yourself from the work. It's to do more of it, faster, without losing the thread. A black box that hands back a diff removes you in the wrong way — it takes the work and the understanding with it. What I wanted instead was an agent I could watch like a colleague pairing over my shoulder: visible while it works, interruptible when it drifts, and leaving notes behind that the next session can read.

That's the bet Cloud Execution makes. The ephemeral VM and the PR are the parts everyone builds. The live terminal, the mid-run steering, and the write-back into the workunit are the parts that keep you in the loop — and those are the parts that turn an agent run from something you supervise nervously into something you actually trust.

Get started free

Bring your AI agents and your team into one workspace

Workunit gives your agents structured context and your team a shared place to plan, track, and ship the work. Free to start, no credit card.

Keep going

Dive deeper

Want to go from the idea to the actual workflow? These guides walk through it step by step.

Questions?

The full guide library covers every part of Workunit, and the community discussions are open if you want to ask.