Are We Drifting? — Part 12: Testing Without Drift

Jun 12, 2026

Are We Drifting? — Part 12: Testing Without Drift

A test suite is supposed to be the thing that catches drift. But tests are code too, and they have their own way of drifting — away from the production they claim to verify. This part is about keeping the test world honest.

Are we drifting here?

The most dangerous test is the one that is green and wrong.

It happens whenever a test asserts against a copy of reality that has drifted. A repository test runs against a mocked database and passes — while the real migration it depends on is broken, so production fails on the exact path the test “covered.” A screen test renders hand-built sample data that no longer looks like any real product state, so it passes while the actual empty-state crashes. A test depends on time.Now() and is flaky at midnight.

In each case the suite is green and the system is broken, because the test drifted from the thing it was meant to mirror. So the question for testing is: do the tests still resemble production — or have they drifted into a comfortable fiction?

Fixtures are a vocabulary of product states

The frontend’s answer starts from the same primitive as everything else in this series: a closed, canonical set, declared once, shared everywhere.

Fixtures are not throwaway sample data. They are the canonical vocabulary of product states — regular, empty, loaded, and the loading/error arms when a screen has them — defined once per feature with defineFixtures, and shared by Storybook, the screen test, the route’s initial state, QA, and AI prototyping. One set of states, consumed by every surface.

That sharing is the anti-drift mechanism. Because the Storybook story, the test, and the running route all seed from the same fixtures, a story cannot depict a state the test never checks, and neither can drift from the shape the route actually renders. And a governed rule (FE-1) keeps the vocabulary from leaking: screens may never import fixtures — only routes, stories, and tests may — enforced by a lint rule, so the seam is structural. It is the same move as the backend’s table-driven tests, where each case names a domain state (valid input, missing field, not-found, conflict) and the set of named cases is the spec.

Fixtures are product states, not test data. When the same named states feed the workbench, the tests, QA, and the live route, the test world cannot drift from the product world.

Real infrastructure for the data layer

The backend’s hardest-won rule is blunt: the repository is tested against a real Postgres — a container — never a mock.

The reason is a scar. Mocked repository tests once passed green while the migration they relied on failed in production. A mock of a database is a copy of your beliefs about the database, and beliefs drift from schemas. So adapter-level tests run against real Postgres (and real MinIO, real queues) in an isolated, throwaway container, behind a build tag so they are opt-in locally and required in CI.

There is a clean division of labor that keeps this from becoming “test everything against everything”:

Fakes verify application behavior.        (fast, in-memory, deterministic)
Real containers verify adapter behavior.  (does the SQL actually run? does the constraint fire?)
Each level verifies what only it can.     (do not duplicate coverage)

A fake repository is the right tool to test a service’s logic — it is fast and controllable. A real Postgres is the only tool that can tell you the query and the migration and the constraint actually work, because those are the parts a mock can only pretend to have. Test the logic against fakes; test the adapter against the real thing; do not make either re-verify the other.

So across both stacks, the answer to “are we drifting here?” is not “probably not” — it is a set of named gates that each refuse a specific known failure mode.

On the frontend, the first gate is liftmere/no-fixture-outside-route (tools/eslint-no-fixture-outside-route.mjs), which exists because screens pulling fixtures in directly breaks the shared-vocabulary seam — it is a lint error, not a convention. The second is check-fixture-registry-coverage.mjs (apps/client/scripts/), which catches collections that were defined but never registered, making them invisible to App Mode’s preset injection and therefore untestable by anyone who doesn’t know to look.

The third gate has a specific story behind it.

A plate-calculation bug once passed every unit test and failed in the live dev app because it involved a setState-during-render that only surfaces when React runs in StrictMode — which double-invokes effects in development to expose exactly this class of mistake. The unit tests did not see it because they bypassed the <StrictMode> wrapper by importing directly from @testing-library/react. So we added a no-restricted-imports rule on test files: import from ~test-utils instead, which wraps the renderer in <StrictMode> automatically. The bug can no longer pass a test that would have missed it.

A test that does not run in StrictMode is not testing what the development app runs. The gate makes the test environment and the live environment agree.

On the backend, the integration-test gate exists for the same reason as the real-Postgres rule: a mocked repository cannot lie to you in a way a real migration will catch. Adapter-level tests run under a //go:build integration tag and are invoked via make test-be-integration (go test -tags=integration -race ./...), which also enables the race detector — so the gate is both a correctness check on the SQL and a concurrency check on the adapter code.

The final gate is check-mocks, which runs in CI to verify that mockgen-produced fakes still match the interfaces they were generated from. When an interface changes, the mock changes too, or the build fails. An interface and its mock cannot silently diverge.

The five gates, in one place

Drift mode	Named gate	What it catches
Fixture imported by a screen	`liftmere/no-fixture-outside-route` (`tools/eslint-no-fixture-outside-route.mjs`)	Screens, flows, runtimes pulling fixtures in directly, breaking the shared-vocabulary seam
A `defineFixtures` collection not in the registry	`check-fixture-registry-coverage.mjs` (`apps/client/scripts/`)	Collections that exist but are invisible to App Mode preset injection
Effect-lifecycle bug that only shows in dev	`no-restricted-imports` of `@testing-library/react` — import from `~test-utils` instead	Bypassing the `<StrictMode>` wrapper that double-invokes effects; the motivating case was a plate-calc `setState`-during-render regression that passed unit tests but failed in the live dev app
Mocked repo that lies about the schema	`//go:build integration` + `make test-be-integration` (`go test -tags=integration -race ./...`)	Repo tests that pass against a mock while the real migration is broken — the scar this rule came from
Generated mock out of sync with its interface	`check-mocks`	`mockgen`-produced fakes that no longer match the interface they were generated from

Each gate fires in CI. None of them require a human to remember the rule.

Determinism: don’t let reality leak in

The last source of test drift is nondeterminism. A test that depends on the wall clock, a random ID, or ambient ordering passes and fails for reasons that have nothing to do with the code.

The discipline is to inject the nondeterministic things — time, ID generation, randomness — behind tiny interfaces, so a test supplies a fixed clock and a deterministic ID sequence. The same application, wired with a fake clock and seeded IDs, produces the same output every run. Combined with fakes for infrastructure, this is what lets a system-level harness exercise the real wiring of the app — real services, real flow — against fake, deterministic edges. The test runs the production code path; only the world at the edges is controlled.

The velocity payoff

The check this removes is the one you only discover you skipped when production breaks: does this test actually correspond to the real system, or to a copy that has wandered?

When fixtures are shared product states, a passing screen test means the workbench, QA, and the route all agree about that state — you do not re-author sample data per surface, and you trust the green. When the data layer is tested against a real database, a passing repo test means the SQL and the migration genuinely work — you are not shipping on a belief. When time and IDs are injected, a green run means the logic is right, not that you got lucky with the clock.

That trust is, once again, the velocity. A suite you can believe is one you can move behind quickly; a suite full of comfortable fictions is one you have to re-verify by hand, which is the coherence tax this whole series is about removing.

And the infrastructure that buys this trust — fixtures-as-vocabulary, throwaway testcontainers, the FE-1 gate — is cheap to stand up now, while the surface area is small, and only gets more expensive to retrofit as the system grows.

What’s next

There is one boundary left that drifts harder than any database: a language model, which will cheerfully invent a value, a type, or a malformed shape. Part 13: AI Output Is Untrusted Input treats AI output as exactly what it is — untrusted input — and shows how the vocabularies from earlier parts become the contract a model is forced to conform to.