2026-01-23 | PreviewProof Team

How AI-Generated Code Breaks the Assumptions Your Preview Environment Was Making

AI coding agentspreview environmentsverified software deliverycode review

Preview environments, as a workflow pattern, were designed in a world where humans wrote code at human speed. Heroku Review Apps, Vercel previews, Netlify deploy previews — all of them assume a particular shape of pull request. A few hundred lines of diff. A developer who can defend the change. A reviewer with loose context about the area being modified. One or two PRs in flight per engineer at a time.

That shape is gone. Anyone shipping with Cursor, Claude Code, or one of the autonomous agent tools has watched their PR throughput change in ways the existing review workflow was never designed to absorb. And the preview environment workflow — the layer that’s supposed to make verification easy — quietly becomes the bottleneck instead of the accelerator.

This post is about which assumptions break, and what preview workflows need to look like when most pull requests are agent-authored.

The assumptions that worked when humans wrote the code

Four implicit assumptions are baked into any preview-based review workflow.

The author can defend the change. The person who opens the PR can answer “why did you do it that way?” If the reviewer flags something, the author can explain the trade-off. Review is a dialogue between two people who understand the change.

The change is scoped to what was asked. A human writing a feature touches the files they need to touch and stops. Side-effects in unrelated files are rare and deliberate. The diff approximates the intent.

The reviewer has a reasonable model of the modified code. PRs land in areas someone on the team owns. They know the conventions, the gotchas, the parts that look fine but are load-bearing.

Accountability is implicit. If the change breaks production, there’s a name on the commit. Knowing your name is on it shapes how carefully you write it.

Every one of those assumptions is wrong now.

What AI-generated code does to each one

Authors can’t defend the change. The human who triggered the agent can defend it at the level of intent — “I asked it to add pagination” — but not at the level of implementation. The defense of the actual code is “the model produced this.” That’s not a defense. Reviewers asking implementation questions get back a fresh agent run, not reasoning.

Change scope expands silently. Agents touch files humans wouldn’t. They reformat adjacent code, “fix” lint warnings in files they were never asked about, rename variables for consistency, quietly modify imports across the tree. None of it is malicious. Most is reasonable in isolation. But the diff no longer approximates the intent.

Reviewer familiarity collapses. When an engineer is shipping fifteen agent-authored PRs a week instead of three hand-written ones, those PRs touch parts of the codebase the engineer doesn’t normally work in. Two people without context are looking at code neither of them wrote, generated by a model that doesn’t remember producing it.

Accountability becomes legible to lawyers and illegible to engineers. The commit author is still a person. But the meaningful authorship — the decisions, the trade-offs — happened inside a non-deterministic system. “Who’s accountable when this breaks?” gets you answers that satisfy a compliance officer and nobody who has to fix the bug.

What this does to your preview workflow

When the assumptions above held, a preview was a luxury — a nice way to click around before merging. The bulk of verification still happened by reading the diff.

Now the diff is the least reliable part of the review, the preview is doing most of the work, and most preview workflows aren’t built for that load:

Seed data is too thin. Most teams seed preview databases with enough data to make the app load, not enough to exercise the full feature surface. With agent-authored code, the data has to do work the reviewer used to do.
Feedback flow is too linear. Comment, wait for human, wait for fix, wait for new preview. Fine at five PRs a week. A bottleneck at fifty.
Approval is implicit. “Looks good to me” means nothing when the reviewer didn’t have context to know what “good” looks like.
Evidence is fragmented. Preview URL, GitHub comments, Slack threads, occasional screenshots. Useless when you’re reconstructing, months later, what was verified.

What preview workflow has to look like now

Concretely:

Seed data has to exercise the surface, not just populate it. Empty states, full states, edge cases, every major flag combination, both happy-path and adversarial fixtures. We covered this in seeding Postgres for ephemeral previews and synthetic data for realistic previews — and it matters more for agent-authored PRs than human ones, for reasons we get into in seed data for AI pull requests.

Test surface has to be broader. Behavioral testing in the live preview, not just unit tests written by the agent that wrote the code. See testing AI coding agent output beyond the diff.

Review has to be structured. Named approvers, scoped checks, explicit sign-off. A verification checklist that the reviewer works through against the running preview, not against the diff.

Feedback has to be machine-readable. When a reviewer flags something, the agent should be able to consume that feedback and produce the next iteration without a human translating it into a fresh prompt. Otherwise the bottleneck just moves.

Evidence has to be captured automatically. Who reviewed which preview, against which artifact, with what result. Without that, the workflow is unauditable — which matters for regulated teams now, and will matter for everyone soon.

The connection to verified software delivery

The thread that runs through all of this: when AI writes the code, the verification problem isn’t a code review problem. It’s a preview environment problem.

A reviewer reading a diff cannot tell whether agent-generated code does what was intended. The diff looks fine — agents produce smooth-looking code. The tests pass — agents write tests that match the code they wrote. The only place the truth shows up is in the running system, with realistic data, exercised by a human who has context about what was supposed to happen.

That’s the gap verified software delivery names. AI coding agents didn’t create the gap; they made it impossible to ignore. And closing it is, more than anything else, a matter of upgrading your preview environment workflow from “nice to have” to “where verification actually happens.”

A short, honest plug

If you’re trying to retrofit your existing preview setup to handle agent-authored PRs and finding the seams everywhere — thin seed data, ad-hoc review, no real evidence trail — you can absolutely build the upgrade yourself. Some teams should. But it’s a real project.

PreviewProof is what we built for teams that don’t want to. Per-PR previews with seed data that actually exercises the surface, structured review with named approvers, machine-readable feedback agents can consume, and a tamper-evident evidence log that survives an audit. Preview it. Prove it.