Are We Drifting? — Part 6: Config and Metrics
Are We Drifting? — Part 6: Config and Metrics
Section titled “Are We Drifting? — Part 6: Config and Metrics”The vocabularies in Part 4: Enums as Shared Vocabulary and Part 5: The Error Manifest described a service’s data. This part covers its edges — the environment it trusts on the way in, and the signals it emits on the way out. Both drift. Both are the same atom.
Are we drifting here?
Section titled “Are we drifting here?”Config drift is the classic one. An environment variable is read with os.Getenv in three different files, with three slightly different defaults, and one of them forgets the variable is required. The .env.example that is supposed to document what the service needs was last updated two features ago. The service boots fine in dev and falls over in staging because a variable nobody remembered is missing — and the failure is a nil deref deep in a request, not a clear “you forgot DATABASE_URL.”
Metrics drift the same way. A counter is incremented with a string literal "media.upload.intent_created", and the dashboard queries media.uploads.intent_created — one word apart, silently graphing nothing. A label gets a new value that no one added to the alert.
Both are closed sets that cross a boundary — the process boundary for config, the telemetry boundary for metrics. Both want to be vocabularies.
The same three-tier wall that governs the enums and the error manifest governs these edges too: a lint-error for the things that happen daily, a compile assertion for the seam that must never rot, and a CI coverage script for the inventory that drifts on its own schedule.
Config as a vocabulary
Section titled “Config as a vocabulary”A config entry is the familiar atom: a name (how Go refers to it), a wire (the environment variable), a label, and metadata describing its type, default, and validation. Here are a few real entries from our API’s config manifest:
kind: configsource: envmodule: apiname: Configentries: - name: Port wire: PORT label: HTTP listen port metadata: type: string default: "8080" group: server
- name: DatabaseURL wire: DATABASE_URL label: PostgreSQL connection URL metadata: type: string group: database validation: required: true
- name: S3SecretAccessKey wire: S3_SECRET_ACCESS_KEY label: S3 secret access key metadata: type: string secret: true group: s3The interesting part is that validation is itself declarative. The manifest does not just list variables; it states the rules between them, in a small vocabulary of its own:
- name: SupabaseJWKSURL wire: SUPABASE_JWKS_URL metadata: validation: one_of_group: supabase_jwt # exactly one of this group must be set required_when: field: AppEnv values: [staging, production] # …but only in these environments
- name: S3PublicEndpoint wire: S3_PUBLIC_ENDPOINT metadata: validation: must_match_or_empty_when: # must equal S3Endpoint, or be empty, field: S3Endpoint # when running in staging/production when: field: AppEnv values: [staging, production]required, required_when, one_of_group, must_match_or_empty_when — the relationships that usually live in a paragraph of onboarding docs (or in nobody’s head) are declared facts. From this one manifest, buildmere generates:
- a typed Go
Configstruct, so config access iscfg.DatabaseURL, never a strayos.Getenv; - a
Load()that reads the environment and aValidate()that enforces every rule above; - a
.envexample file, so the documented environment cannot drift from the required one; - an
env-checkcommand that validates a real.envagainst the manifest.
That last pair is the anti-drift hinge. The example and the validator come from the same source as the struct, so “what the service needs,” “what the example shows,” and “what the startup check enforces” are guaranteed to be the same list. A missing required variable fails at boot with a clear message — or fails env-check in CI before it ever boots.
The CI tier closes that gap explicitly: env-check runs the config manifest’s declared requirements against a real .env and fails the build when the two lists disagree — a variable the service stopped reading still documented in the example, or a newly required variable not yet added.
Metrics as a vocabulary
Section titled “Metrics as a vocabulary”A metric entry is the same atom with instrument metadata. Here is the real media-metrics manifest:
kind: metricoutput: ".."module: medianame: MediaMetricsentries: - name: UploadIntentCreated wire: media.upload.intent_created label: Upload intents created metadata: instrument: counter labels: - name: content_type - name: outcome values: [accepted, rejected]
- name: TranscodingDuration wire: media.transcoding.duration label: Transcoding job duration metadata: instrument: histogram unit: s labels: - name: status values: [success, failure]
- name: ActiveUploads wire: media.uploads.active label: In-progress uploads metadata: instrument: updown_counterNotice the nested vocabulary: a label’s values: [accepted, rejected] is itself a closed set, declared inline. The metric name (media.upload.intent_created) is the wire; it is written exactly once, here.
The generated Go is, again, zero-import — it depends only on context, and a Factory interface the project implements. This is verbatim:
// Code generated by buildmere; DO NOT EDIT.
package media
import "context"
// Factory builds instruments by name. Implementations live in the consuming// project's metrics kit; the generated code never imports it.type Factory interface { Counter(name, desc string) interface { Add(ctx context.Context, n int64, labels ...any) } Histogram(name, desc, unit string) interface { Record(ctx context.Context, v float64, labels ...any) } UpDownCounter(name, desc string) interface { Add(ctx context.Context, n int64, labels ...any) }}
var MediaMetrics = &mediaMetrics{}
// Register builds every instrument from f. Call once after the metrics// backend is initialized.func (m *mediaMetrics) Register(f Factory) { m.uploadIntentCreated = f.Counter("media.upload.intent_created", "Upload intents created") m.transcodingDuration = f.Histogram("media.transcoding.duration", "Transcoding job duration", "s") m.activeUploads = f.UpDownCounter("media.uploads.active", "In-progress uploads")}
// RecordUploadIntentCreated records Upload intents created.// Labels: outcome (accepted | rejected)func (m *mediaMetrics) RecordUploadIntentCreated(ctx context.Context, contentType string, outcome string) { if m.uploadIntentCreated == nil { return } m.uploadIntentCreated.Add(ctx, 1, "content_type", contentType, "outcome", outcome)}Two things matter here.
First, the call site is typed: RecordUploadIntentCreated(ctx, contentType, outcome). You cannot fat-finger the metric name — it is baked into the generated method — and you cannot forget a label, because the labels are parameters. The string "media.upload.intent_created" exists in exactly one place in the whole codebase.
Second, the generated code imports no OpenTelemetry. It defines a Factory interface and depends on that. The project’s own metrics kit implements the interface (metricskit.BuildmereFactory) — the ~60-line adapter from Part 3: buildmere, a Codegen Kernel — and buildmere ships a compile-time assertion that the adapter satisfies the generated Factory. The instrument set is portable; the binding to OTel is the project’s, and the compiler checks the seam.
That seam is held by a zero-cost Go idiom: var _ Factory = (*metricskit.BuildmereFactory)(nil). That line compiles to nothing and refuses to compile at all if the adapter ever falls out of sync with the generated interface — no test to write, no CI script to wire, just a fact the compiler checks on every build.
No raw values at the edge
Section titled “No raw values at the edge”The manifests govern the names and shapes that cross the boundary. The remaining gate governs the values — the rule that nothing structured gets flattened into a raw string on the way out. That is what sloglint in apps/api/.golangci.yml enforces, across three properties, each added because a specific class of log corruption had already happened or was structurally guaranteed the moment an agent writes a handler without knowing the rule.
static-msg fires when a developer writes slog.Error("query failed for user " + userID) — the interpolation swallows the structure and makes the field invisible to a log query.
no-mixed-args fires when the call mixes positional and key-value args: slog.Error("query failed", err, "athlete_id", id) looks reasonable at a glance, but the error is a positional arg, not a keyed one, and sloglint rejects it.
context: scope is the one with the longest tail — it requires slog.ErrorContext (the *Context variant) any time a ctx context.Context is already in scope, because trace IDs travel through context and a plain slog.Error in a handler throws that thread away.
Three lint rules. Three specific call-site shapes. Each one fires before the PR lands.
The natural extension is an otelslog bridge — feeding those structured records straight into spans when an OTLP endpoint is configured. The slog surface is the current one, and the bridge is where this goes next.
| Drift scenario | Named gate | Tier |
|---|---|---|
.env.example drifts from what the manifest declares required | env-check against the manifest | CI · coverage-script |
New metricskit adapter does not satisfy the generated Factory interface | var _ Factory = (*metricskit.BuildmereFactory)(nil) | compile-enforced |
slog.Error("query failed for user " + userID) — interpolated message swallows structure | sloglint: static-msg · apps/api/.golangci.yml | lint-error |
slog.Error("query failed", err, "athlete_id", id) — positional arg mixed with key-value pair | sloglint: no-mixed-args · apps/api/.golangci.yml | lint-error |
slog.Error(...) called while ctx context.Context is in scope | sloglint: context: scope · apps/api/.golangci.yml | lint-error |
One exclusion worth naming: internal/programs is excluded from all linters pending its Connect-RPC rewrite — TODO(programs-rpc) — so the three sloglint rules do not fire there today.
The velocity payoff
Section titled “The velocity payoff”For config, the check this removes is the onboarding-and-2am one: which environment variables does this service actually need, and which are required where? That is now the manifest, enforced at boot and in CI. Nobody greps for os.Getenv to reconstruct the answer.
For metrics, it removes the silent-dashboard check: does the name I emit match the name I graph, and did I pass the right labels? The name is written once and the call is typed, so the emit side cannot drift from the declared signal.
In both cases the edge of the service is described in one declarative place, and the generated code plus the gate make the description binding.
What’s next
Section titled “What’s next”That closes the vocabularies arc — the one source → many projections → drift gates shape applied to data, failures, environment, and telemetry. Part 7: Models at Boundaries zooms out from closed value sets to whole models, and the discipline that keeps a “project” or a “workout” from becoming one overloaded struct smeared across the database, the API, and the frontend.