The 90% Solution: Eliminating Environment-Variable Headaches

Tools Are Externalized Muscle Memory
Environment Configuration

I spent three hours debugging why staging was failing but production was working.

Same code. Same Kubernetes cluster. Same Docker images.

The difference? One environment variable in ConfigMap was spelled DATABASE_URL in staging and DATABASE_URI in production.

Three hours. One typo. That was the moment I stopped blaming myself and started building envcheck.

The Problem That Keeps Giving

Environment files are the worst kind of technical debt. They're simple, they're everywhere, and they fail in ways you don't notice until 3 AM.

I've seen this pattern repeatedly:

ScenarioWhat HappensWhen You Notice
Misspelled keyApp uses default valueProduction incident
Missing keyCrash on startupDeployment time
Duplicate keyOne wins, one losesDebugging logs don't match
Drift between envs"Works on my machine"Code review

That "impossible" bug where it works locally but fails in staging? It's not impossible. It's inevitable at scale. Your environment configuration is telling you something.

Most teams ignore it until it hurts. I got tired of the pain.

What I Tried First

I did what everyone does. I copied .env.example to .env, filled in the values, and prayed.

Then I added pre-commit hooks. Then I wrote shell scripts to compare files. Then I tried to remember which variables mattered.

But here's the thing: environment variables aren't just in .env files anymore.

They're in:

  • Kubernetes ConfigMaps and Secrets (SecretKeyRef, ConfigMapKeyRef)
  • Terraform variable definitions (TF_VAR_*)
  • Ansible vault lookups (lookup('env', 'VAR'))
  • GitHub Actions workflows (env: blocks)
  • Helm charts (values.yaml screaming snake case)
  • ArgoCD Application manifests (plugin.env)

I had configuration scattered across seven places. No single source of truth. No way to verify they're in sync.

So I did what any reasonable engineer would do.

I built a tool.

The envcheck Approach

I wrote envcheck in Rust because I wanted it to be fast. Like, stupidly fast.

# Install
cargo install envcheck

# Lint a file
envcheck lint .env

# Compare environments
envcheck compare .env.example .env.prod

# Fix automatically
envcheck fix .env --commit

But the real power isn't the linting. It's the sync validation.

# Check Kubernetes manifests against .env
envcheck k8s-sync k8s/*.yaml --env .env.example

# Check Terraform against .env
envcheck terraform infra/ --env .env

# Check GitHub Actions workflows
envcheck actions .github/workflows --env .env

Now instead of wondering whether my K8s manifests match my environment file, I know. In about 3.3 microseconds.

The Design Philosophy

I didn't set out to build another linter. The world has enough linters.

I wanted to solve the coordination problem.

Environment configuration is a distributed system problem. You have multiple sources of truth (.env, K8s, Terraform) and no way to verify they're consistent.

envcheck treats environment variables like a coordination problem:

  1. Define once (.env.example as the source of truth)
  2. Verify everywhere (K8s, Terraform, Ansible, GitHub Actions, Helm, ArgoCD)
  3. Fix automatically (sort keys, remove duplicates, auto-commit)

Order is what you build. Chaos is what you debug. Environment files want to drift into chaos. You're the one holding them together.

What Actually Happened

I released envcheck as open source. It got some traction. People started using it.

But the surprising part wasn't the usage. It was the peace of mind.

I stopped worrying about whether staging matches production. I stopped wondering if I missed a variable in the Kubernetes manifest. I stopped staying up until 3 AM debugging typos.

The pizza was good. The sleeping through the night was better.

The 90% Solution

Here's the uncomfortable truth. Envcheck doesn't solve 100% of environment configuration problems.

There's still the problem of:

  • Secret management (where do you store the actual values?)
  • Rotation (what happens when you need to change passwords?)
  • Audit trails (who changed what and when?)

But envcheck solves the 90% that I was dealing with every week:

  • Typos in variable names
  • Missing keys across environments
  • Drift between .env.example and actual files
  • Inconsistent formatting that makes diffs unreadable

Sometimes the best automation is no automation. But when you do automate, automate the thing that's actually causing pain.

Get Started

If you're tired of environment configuration drift, here's the thing:

cargo install envcheck
envcheck lint .env

Or if you prefer npm:

npm install -g envcheck
envcheck lint .env

Documentation at envcheck.github.io.

GitHub: github.com/envcheck/envcheck

What This Taught Me

Tools are externalized muscle memory. The things you build are the things you've done enough times that you never want to do them manually again.

I built envcheck because I suffered this problem repeatedly. I spent three hours debugging a typo. I vowed to never let that happen again.

That's specific knowledge, and you can't copy it. You have to live through the pain.

Most systems want to dissolve. You're the one holding them together.


Enjoy this? You might like SeekingSota - weekly essays on what happens when engineers stop programming and start conducting AI agents.

Building tools? Check out envcheck, NerfStatus, or HCT.