Most software teams that say they "use AI" mean they paste code from ChatGPT into their editor and fix the errors manually. That's a tool. What we run is a loop — a structured reasoning cycle that thinks about the problem, acts on it, observes what happened, and updates its understanding before doing anything else. That distinction is why the output is different.
What ReAct actually means
ReAct is a research pattern from a 2022 paper (Yao et al.) that showed language models produce significantly better results when they interleave reasoning and acting — rather than generating all their reasoning up front and then taking action, or acting without reasoning at all.
The insight was simple: an agent that can observe the result of each action — a test failing, an API returning unexpected data, a file that doesn't exist yet — and reason about what to do next is fundamentally more reliable than one that commits to a multi-step plan without feedback.
In practice, it looks like this:
// payment integration, day 2
Thought: The Stripe webhook endpoint needs idempotency keys.
Without them, retries will create duplicate charges.
Action: read_file("api/webhooks/stripe.ts")
Observation: No idempotency handling found. Retry logic present.
Thought: This is a real risk. Flag for human review before proceeding.
Action: request_review(scope="stripe idempotency", priority="high")
Observation: Review confirmed. Fix approved.
Thought: Implement idempotency key storage using Redis TTL.
Action: edit_file("api/webhooks/stripe.ts", patch=idempotency_fix)
Observation: Tests passing. No regressions in billing suite.
Notice what didn't happen: the agent didn't assume everything was fine. It read the actual file, found the actual gap, escalated before fixing, confirmed the fix. That's the loop working as designed.
The difference between a loop and a prompt
A single prompt is a straight line: input → output. It can generate good code in a narrow, well-defined context. But real software development isn't narrow. It involves reading existing code, understanding dependencies, discovering that the database schema doesn't match the API contract, finding that the third-party service changed its response format last week.
A ReAct loop treats each of those discoveries as new input. The agent doesn't finish its plan and then hand it off — it adjusts the plan in response to what it finds. That's what makes it suitable for real codebases, not just greenfield toy projects.
What a loop can do that a prompt cannot
- Read before writing. The agent inspects existing code, not just the description of it. It sees what's actually there.
- Test as it goes. Each action can include running the test suite. Failures are observations, not surprises.
- Escalate intelligently. When the agent hits something that requires product context or a judgment call, it flags it — instead of guessing.
- Update its own plan. New information changes what the agent does next. The plan isn't locked before the first line of code.
"The agent doesn't finish its plan and then hand it off — it adjusts in response to what it finds."
Where the loop stops and judgment starts
The loop has hard stops. Not soft suggestions — actual gates where execution pauses until a senior developer reviews what happened and decides what comes next.
These checkpoints exist at predictable places: architectural decisions (which database pattern, which API contract, which trade-off between simplicity and extensibility), security-sensitive changes (auth, payments, permissions), and anywhere the agent's confidence is flagged below a threshold we set at project start.
What happens at those checkpoints isn't a rubber stamp. The developer reads the reasoning, not just the output. They check whether the agent understood the constraint correctly, whether the approach is the right one for this team's future maintainability, whether there's a simpler path the agent missed because it was optimizing for the wrong thing.
The agent is fast and consistent. The developer is the one with taste, product context, and accountability. Both are required. Neither works alone.
A real build loop, step by step
On a typical Build package, the agent runs thousands of loop iterations over 30 days. Most of them are unremarkable — read a file, write a test, make a change, run the suite, move on. But the pattern of those iterations produces something a purely manual build rarely does: a codebase where every edge case was considered before it became a bug.
// day 1 — scoping loop, before any code is written
Thought: Analyzing project brief. What are the likely edge cases
in the booking flow given this schema?
Action: analyze_schema("bookings", "users", "availability")
Observation: 3 potential race conditions in slot reservation.
Multi-timezone support not specified in brief.
Thought: These need to be resolved before implementation begins.
Action: generate_questions(scope="race conditions, timezone handling")
→ questions sent to client, answered on scope call
// day 3 — implementation loop
Thought: Implement slot reservation with optimistic locking.
Action: write_file("db/migrations/add_reservation_lock.sql")
Action: run_tests(suite="reservations", include_concurrency=true)
Observation: All passing. Race condition simulated and handled.
The edge cases that used to surface at QA — three weeks in, when fixing them is expensive — surface on day one, when they're just questions on a scope call.
Questions worth asking any AI-native agency
If you're evaluating whether an agency is genuinely running agentic loops or just using AI as a fancy autocomplete, the questions that cut through the noise:
Do they have structured checkpoints? Where does the agent stop and a human review? If they can't describe it precisely, the human is reviewing outputs, not the reasoning.
Can they show you a real loop trace? The thought-action-observation sequence is visible if you're actually running it. If they can't show you what the agent was thinking, they're not running a reasoning loop.
What does the agent escalate? An agent that never flags anything for human review is either working on trivial problems or not actually checking. The escalations are where the quality lives.
We show this on scope calls — not as a demo, but as the actual loop from a recent project. If you want to see it, book one.