Are We Drifting? — Part 2: The Tagged Vocabulary

Jun 2, 2026

Are We Drifting? — Part 2: The Tagged Vocabulary

Part 1: The Drift Problem ended on a shape: one source → many → . This part is about the source — specifically, the smallest, most boring unit that the entire pattern is built from.

Are we drifting here?

Start with the most innocent thing in any codebase: a closed set of named values.

A video is uploading, processing, ready, failed, or archived. A ticket is proposed, accepted, in_progress, or shipped. An upload was accepted or rejected.

These look harmless. They are strings. Everyone “knows” them.

That is exactly why they drift.

The backend writes "ready". The frontend dropdown lists "Ready". The analytics event sends "READY". A migration’s CHECK constraint allows 'ready' but a later refactor adds 'done' on the API side and nobody updates the constraint. Three months later a row exists that the frontend cannot render, because the value in the database is a status the UI has never heard of.

No single change was wrong. The set was simply written down in six places, and the six copies wandered.

The atom: a tagged vocabulary

Here is the move that makes the rest of the series possible.

A closed set of named values is not a string convention. It is a : a first-class, declarable thing with a precise shape.

Every entry in a tagged vocabulary has three layers.

Layer 1 — Identity
  name   the canonical identifier (how code refers to it)
  wire   the serialized form (how it crosses a boundary)
  deprecated / deprecation_note

Layer 2 — Presentation
  label        the human-facing string
  description  optional documentation

Layer 3 — Kind-specific metadata
  whatever this kind of vocabulary additionally needs

The first layer is the load-bearing one, and the split inside it is the whole trick.

Why `name` and `wire` are different on purpose

Code refers to a value by its name. The boundary — JSON, a database column, an event payload, a model’s structured output — sees its .

Keeping them separate means the serialized representation is a deliberate, declared decision, not an accident of how someone happened to spell a constant.

The name/wire split is only a design decision until a machine makes it load-bearing.

On the backend, the seam that matters is between the handler and the repo — the same clean separation the name/wire distinction declares between identity and wire form.

depguard rules in apps/api/.golangci.yml make that seam structural rather than conventional.

handler-no-db-driver means a handler.go file cannot import pgx — the handler has to speak to the database through the interface the repo exposes, not by reaching past it.

repo-no-transport means a repo.go file cannot import connectrpc.com/connect — the repo returns domain errors, and the handler maps them to Connect codes at the boundary.

Together these two rules ensure the only path from transport to storage passes through a typed interface: the name side and the wire side stay separated by design and by enforcement.

One carve-out, stated plainly: internal/programs runs raw net/http and has no repo seam yet — it is excluded from both rules in .golangci.yml with TODO(programs-rpc) until the Connect-RPC rewrite is done.

On the frontend, defineFixtures is the vocabulary-primitive pattern in different clothes.

A defineFixtures collection — loading, ready, empty, error — is a closed, named set of product states declared in one place, exactly the way an enum manifest declares wire values.

Three gates keep the vocabulary honest.

liftmere/no-fixture-outside-route (tools/eslint-no-fixture-outside-route.mjs) fires when screen.tsx imports fixtures directly — currently lint-warn at scoreboard ~113, target error.

liftmere/prefer-use-state-maybe-fixture fires when route.tsx calls useState(rosterFixtures.ready) directly, bypassing the preset injection that makes App Mode work.

check-fixture-registry-coverage.mjs fails the PR when a defineFixtures collection is missing from fixture-registry.ts, so no vocabulary can hide.

The enforcement cost differs: ESLint rules are file-granular and cheap to add; depguard rules are package-granular and need type information.

The concept is the same on both sides.

This is a real vocabulary from our backend — the lifecycle of an uploaded video:

kind: enum
output: "../enums"
module: media
name: VideoStatus
entries:
  - name: Uploading
    wire: uploading
    label: Uploading
  - name: Processing
    wire: processing
    label: Processing
  - name: Ready
    wire: ready
    label: Ready
  - name: Failed
    wire: failed
    label: Failed
    metadata:
      terminal: true
  - name: Archived
    wire: archived
    label: Archived

Failed carries one piece of kind-specific metadata — terminal: true — because some consumer cares that it is an end state. Everything else is pure identity and presentation.

That file is the source. It is the only place VideoStatus is defined. Every Go constant, every TypeScript union, every SQL CHECK constraint, every dropdown is a projection of it. We will watch those projections get generated in Part 4: Enums as Shared Vocabulary.

The same atom, wearing different metadata

The reason this is worth elevating to a “primitive” is that it is not just enums.

Look at three vocabularies from the same backend, side by side. They are obviously the same shape.

An enum — identity, label, a flag:

- name: Failed
  wire: failed
  label: Failed
  metadata:
    terminal: true

An error — identity, label, plus the metadata a failure needs (a transport code, and the typed fields it carries):

- name: UploadTooLarge
  wire: upload_too_large
  label: Upload exceeds size limit
  metadata:
    code: invalid_argument
    fields: [asset_id, size_bytes, max_bytes]

A metric — identity, label, plus the metadata an instrument needs (its kind and its labels):

- name: UploadIntentCreated
  wire: media.upload.intent_created
  label: Upload intents created
  metadata:
    instrument: counter
    labels:
      - name: outcome
        values: [accepted, rejected]

Three different concerns — a state machine, a failure mode, an observability signal. One structure: a closed set of entries, each with a name, a wire, a label, and a bag of kind-specific metadata.

An enum is the base vocabulary. An error is that base plus failure metadata. A metric is that base plus instrument metadata. Generalize any of them down and you get the enum back.

Once you see it, you cannot unsee it. Permissions are a vocabulary (identity plus an action and a resource). Config keys are a vocabulary (identity plus a type and a default). Event types, job kinds, plan tiers, notification channels — every closed named set that crosses a boundary is the same atom.

Why this matters for humans and agents

The unification is not academic tidiness. It collapses N mental models into one — and that is worth the most when the contributor is an agent.

Without it, every closed set is a bespoke situation: enums are declared one way, errors another, config a third, each with its own conventions and its own places to look. A contributor — human or model — has to learn each one, and an agent generating code has N chances to invent a value inline because it did not know the convention for that particular kind.

With it, there is exactly one rule, and it fits in a sentence:

If it is a closed, named set of values that crosses a boundary, it is a vocabulary. Vocabularies live in manifests. Code uses the generated constants. Inventing a value inline is a lint failure, not a style nit.

Every repeated question now has a single answer. What are the allowed values for X? — the manifest for X. How do I add one? — add an entry, regenerate, commit. How do I deprecate one? — mark it deprecated in the manifest. Where is this value used? — follow the projections from the manifest.

Not shared vocabulary — a canonical language

“Shared vocabulary” undersells what this is. A vocabulary declared once is the canonical language of the whole system, and it is spoken by three audiences at the same time: the systems that serialize the value across a wire, the humans who name it and argue about it in review, and the agents that write against it.

The third audience is the one that changes the stakes. An agent does not only inherit the vocabulary by compiling against generated constants — it can discover it. The same closed sets are reachable at runtime: an agent asks for the live set of backlog owners or statuses over MCP rather than hard-coding them, and the operations a service exposes are themselves introspectable, so the agent reads the contract instead of guessing it. The vocabulary is not just baked into the binary; it is a queryable, shared language.

A vocabulary that humans, systems, and agents all draw from — and that agents can discover at runtime — is not a naming convention. It is the the whole codebase speaks.

That is why the atom matters out of proportion to its size. Get the closed-set-declared-once right, and you are not deduplicating strings — you are giving every participant, silicon or human, the same words.

The velocity payoff

The coherence check this removes is the one nobody schedules and everybody pays: did the value I just used actually exist, spelled exactly that way, on every side that has to agree about it?

When that check is a lint rule reading from a manifest, the answer is structural. An agent can add a status, regenerate, and ship — and the moment it references a value that is not in the vocabulary, the build says so, before review, before production.

The expensive part was never typing the enum. It was the standing tax of trusting that the six copies still agreed. Naming the atom is what lets a machine pay that tax for you.

What’s next

We have the atom. Part 3: buildmere, a Codegen Kernel introduces the machine that turns it into code on every side: buildmere, a small codegen kernel where each kind of vocabulary is a plugin — and where the drift gate is one make target.

Are We Drifting? — Part 2: The Tagged Vocabulary

Are We Drifting? — Part 2: The Tagged Vocabulary

Are we drifting here?

The atom: a tagged vocabulary

Why name and wire are different on purpose

The same atom, wearing different metadata

Why this matters for humans and agents

Not shared vocabulary — a canonical language

The velocity payoff

What’s next

Why `name` and `wire` are different on purpose