One Dev Environment for Humans, Agents and CI

May 20, 2026

For years, “works on my machine” was a two-audience problem. A human ran the code on their laptop, a CI runner ran it in a clean VM, and most of the engineering effort around dev environments was about closing the gap between those two. We got pretty good at it. Lockfiles, container images, build matrices, hermetic toolchains. By 2024 a serious team could plausibly claim that a green CI build was a fair predictor of what would happen on a colleague’s machine.

That two-audience model is now obsolete.

AI agents are the third audience now. They open pull requests, run the test suite, edit code, and expect the repository to behave like a place where work happens. Most of the time the failure has nothing to do with the model. The agent runs just test and gets “command not found.” It opens a Go file and the language server isn’t there. The model is fine; the repo was never set up for it.

I want one dev environment that works for all three audiences out of one config, with no per-audience special cases. The pattern I keep coming back to is devcontainer + Nix + Docker, plus two small tools — devcontainer-env and devcontainer-ci — that fill in what the devcontainer spec doesn’t cover. The rest of this post walks through that layering and ends with a worked example you can copy.

The Third Axis of Drift

Drift, in the dev-environment sense, is the divergence between where code is written and where code is run. The two-audience era worried about two kinds of drift: human-to-human (different laptops, different macOS versions, different Homebrew states) and human-to-CI (the laptop has a stale node_modules, CI starts from scratch).

Agents introduce a third kind, and it behaves differently. A human contributor will eventually notice when their tools are missing. They read the error, install the dependency, move on. CI is the opposite extreme: it always starts from nothing and is forced to declare every dependency explicitly. Agents sit awkwardly between the two. They have enough autonomy to keep trying when something breaks, but almost no ability to fix the underlying environment. So they fail in subtler, more expensive ways: they invent a workaround, or they edit the code to compile against whatever Go version happens to be on the host. The pull request looks plausible. The drift is invisible until a human reviewer notices the assumptions baked into the diff.

The cheaper fix is to take the variable away. If the agent boots into the same environment the human uses and CI verifies, all three failure modes collapse into one: the environment is wrong. Fix the environment, and everyone benefits at once.

Where Docker Stops and Nix Starts

The first time I sat down to design this seriously, I spent an embarrassing amount of time arguing with myself about whether the answer was Docker or Nix. They solve different problems, and the layering matters.

Docker is the container. It decides what kernel runs, what filesystem you see, what user you are, what network you can reach. It’s great at isolation and portability across operating systems, and mediocre at reproducibility. A Dockerfile that ran fine in March will pull a different apt package in May because Debian’s index moved. Pinning every apt version is possible but brittle, and nobody actually does it.

Nix is the toolchain inside the container. It pins the language runtimes and CLIs at exact versions through flake.lock, which is a content-addressed snapshot of every package the shell needs. Re-evaluating the same flake on the same lock produces the same closure today and in five years. Nix is excellent at reproducibility, and mediocre at isolation: a Nix shell on a Mac is still subject to whatever the Mac decides to do with file permissions, code signing, and the dynamic linker.

Stacked, the two layers cover what neither does alone. Docker provides the kernel and OS surface, so the same instructions behave the same way on a Mac, a Linux laptop, and a GitHub-hosted runner. Nix puts the toolchain inside that surface, so everyone ends up with the same Go, the same Hugo, the same just, lockfile-pinned. The split I use: kernel, filesystem, and network are Docker’s problem; which version of which CLI lives on the path is Nix’s.

The Devcontainer as the Shared Contract

Devcontainers are the piece that makes this layering legible to humans, CI, and agents at the same time. The spec was originally a VS Code feature, but it has matured into a portable contract: a devcontainer.json describes the image, the services, the features, the post-create commands, and the workspace mount. Any tool that speaks the spec can boot the same environment.

What makes the devcontainer the right contract, rather than a Makefile or a shell script, is that it commits to a running container rather than a build recipe. Humans attach their editor to that container, CI boots the same one on a clean runner, and agents shell in with no extra setup. Nobody negotiates with Homebrew, apt, or winget. The container handles isolation; the Nix flake inside it handles versioning.

The spec is good at booting that container. On its own, it doesn’t let the host shell talk to services running inside the container, and CI can’t reuse the same configuration without duplicating it. Two small tools close those gaps.

The Connective Tissue

The first gap is that the host wants to talk to the container’s services. When the devcontainer starts a database or a queue, those services live on container ports under container hostnames: postgres:5432, not localhost:5432. A human running their browser on the host needs to reach them on host ports. A script on the host that wants to use the same DATABASE_URL as the container needs that URL rewritten on the way out. Doing this by hand works for one port and falls apart by the fifth.

devcontainer-env is the small Rust CLI I wrote to fix this. It introspects a running devcontainer, reads the env declared in devcontainer.json’s containerEnv, rewrites any container URLs to forwarded host ports, and exports the result to the host shell. The interesting subcommands are export (eval-friendly output for shellHook) and exec (run a single host command with the container’s env applied). The same idea works for agents: an agent on the host gets a consistent environment by prefixing every command with devcontainer-env exec --.

The second gap is that CI wants to use the devcontainer rather than duplicate it. Without help, a CI workflow grows its own install path: a Node setup step, a Go setup step, a Postgres step with pinned versions, then the actual build. None of that matches what humans run locally. The fix is to make CI run the devcontainer the humans already have, instead of reinventing it.

devcontainer-ci is a GitHub Action that does this. It installs @devcontainers/cli, boots the same devcontainer.json your humans use, runs the workflow inside it, and tears it down on cancellation. The action and the CI YAML both stay short, and the environment CI sees is the environment that lives in the repo.

With both tools in place, the one-config promise holds. Humans use the devcontainer through their editor. An agent invokes it through devcontainer-env exec. CI brings it up through the action. The source of truth is devcontainer.json plus flake.nix; everything else is a thin adapter.

    flowchart TB
    H[Human<br/>Editor or Terminal]
    C[CI<br/>GitHub Actions]
    A[AI Agent<br/>Terminal]

    H -->|VS Code or CLI| BOX
    C -->|devcontainer-ci action| BOX
    A -->|devcontainer-env exec| BOX

    subgraph BOX [Shared devcontainer]
      direction TB
      NIX[Nix flake<br/>pinned toolchain]
      ENV[containerEnv<br/>service URLs]
      SVC[Compose services<br/>postgres, queue, ...]
      NIX --- ENV
      ENV --- SVC
    end

A Worked Example

The example/ directory in the devcontainer-env repo is the pattern at minimum scope: a workspace container, a Postgres service next to it, and a host shell that can psql into the container’s database without anyone editing /etc/hosts. Four files do the work.

The devcontainer pulls in a Compose file so it can declare more than one service, uses containerEnv to publish a DATABASE_URL that points at the Postgres service by its compose hostname, and pins two named volumes (one for /nix, one for the user cache), both scoped to this project by the ${localWorkspaceFolderBasename} prefix. The prefix matters because every repo on the same machine would otherwise fight over a single shared Nix store. Scoping by project gives each repo its own store that survives container rebuilds, which avoids re-downloading the world every time.

{
  "name": "example-api",
  "service": "workspace",
  "dockerComposeFile": "docker-compose.yml",
  "workspaceFolder": "/home/vscode/workspace",
  "containerEnv": {
    "EXAMPLE_API_DATABASE_URL": "postgres://vscode@postgres:5432/example-db?sslmode=disable"
  },
  "mounts": [
    "source=${localWorkspaceFolderBasename}-nix,target=/nix,type=volume",
    "source=${localWorkspaceFolderBasename}-cache,target=/home/vscode/.cache,type=volume"
  ],
  "features": {
    "ghcr.io/devcontainers/features/nix:1": {
      "extraNixConfig": "experimental-features = nix-command flakes"
    }
  },
  "postCreateCommand": "sudo chown -R vscode:vscode /nix /home/vscode/.cache"
}

The postCreateCommand is not optional. Named volumes come up owned by root, and without the chown the vscode user can’t write to the Nix store on first boot, so nix develop fails on a permission error before it does anything useful.

The Compose file defines the two services. The workspace runs the Debian devcontainer base, the database runs Postgres, and the database port uses the dynamic form (ports: [5432]) so multiple projects on the same host don’t collide:

services:
  workspace:
    image: "mcr.microsoft.com/devcontainers/base:bookworm"
    command: sleep infinity
    volumes:
      - ..:/home/vscode/workspace:cached

  postgres:
    image: postgres:18-bookworm
    environment:
      POSTGRES_DB: my-project
      POSTGRES_USER: vscode
      POSTGRES_HOST_AUTH_METHOD: trust
    healthcheck:
      test: ["CMD-SHELL", "pg_isready"]
    ports:
      - 5432

The flake.nix consumes devcontainer-env directly as a flake input, drops it into the dev shell, and runs the export step in shellHook so every entry into the shell gets the container’s env automatically:

{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
    flake-utils.url = "github:numtide/flake-utils";
    devcontainer-env.url = "github:devcontainer-env/devcontainer-env";
  };

  outputs = { nixpkgs, flake-utils, devcontainer-env, ... }:
    flake-utils.lib.eachDefaultSystem (system:
      let pkgs = nixpkgs.legacyPackages.${system}; in {
        devShells.default = pkgs.mkShell {
          packages = [ devcontainer-env.packages.${system}.default ];
          shellHook = ''
            eval "$(devcontainer-env export)"
          '';
        };
      });
}

The packages list is where your project’s own toolchain goes. The example only ships devcontainer-env because the minimal demo doesn’t need anything else at the host shell. A real project extends the list. For this Hugo site, it reads packages = [ devcontainer-env.packages.${system}.default pkgs.hugo pkgs.go pkgs.just ]; for a Rust service it would pull in pkgs.cargo and friends. The flake’s job is to put exactly the tools your humans, agents, and CI need on the path, and nothing else.

One last detail closes the human path. VS Code, by default, attaches you to the container’s bare shell, not to nix develop. A small customizations block in the same devcontainer.json points the integrated terminal directly at the Nix dev shell:

"customizations": {
  "vscode": {
    "settings": {
      "terminal.integrated.defaultProfile.linux": "default",
      "terminal.integrated.profiles.linux": {
        "default": {
          "path": "nix",
          "args": ["develop"]
        }
      }
    }
  }
}

Now opening a terminal in the editor lands the human in the same shell CI runs through nix develop --command ... and the same shell an agent runs through devcontainer-env exec. Nobody has to remember to enter it.

The result, on the host, is what you’d expect from inside the container:

$ nix develop
$ echo $EXAMPLE_API_DATABASE_URL
postgres://vscode@localhost:54320/example-db?sslmode=disable
$ psql $EXAMPLE_API_DATABASE_URL
psql (18.0)
Type "help" for help.
example-db=#

Notice what changed. The URL declared in containerEnv was postgres://vscode@postgres:5432/.... The URL the host shell sees is postgres://vscode@localhost:54320/.... That rewriting is the reason devcontainer-env exists. Inside the container, postgres:5432 resolves. From the host, the same service is reachable on a forwarded port. The application doesn’t have to know which side of the wall it’s on because the env var is correct in both.

The CI workflow brings the same setup to GitHub Actions in five steps:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - uses: devcontainer-env/devcontainer-ci@v1
      - uses: DeterminateSystems/nix-installer-action@v22
      - uses: DeterminateSystems/magic-nix-cache-action@v13
      - run: nix develop --command echo "$EXAMPLE_API_DATABASE_URL"

Both cache steps matter. devcontainer-ci reuses the workspace image across runs, and magic-nix-cache-action reuses the Nix store. Drop either and a job that should take seconds easily runs for several minutes, because the toolchain re-downloads from scratch on every push. The two cache layers in CI are the CI-shaped version of the two named volumes locally.

The CI runner gets the same workspace container, the same Postgres service, the same Nix shell, and the same exported env var. There’s no parallel install path and no second source of truth. The thing the human ran locally is the thing CI runs in the cloud, and an agent prefixing its commands with devcontainer-env exec -- runs in that same world from outside the container.

What Agents Need

The temptation when designing for agents is to invent agent-specific tooling: MCP servers, custom CLIs that “speak agent,” that kind of thing. Occasionally that’s real — an agent that needs to query a vector store has needs a human doesn’t. Most of the time, the agent needs what the human needs and nothing more:

A shell with the project’s toolchain on the path.
The same task runner the human uses (just, make, npm run — pick one and document it).
The same secrets management story the human uses, with no special “agent token” path.
A way to run a single command and get a single, deterministic result.

The Nix shell covers the toolchain. The justfile covers the task runner. The devcontainer’s remote-env covers secrets. The fourth requirement is the real test: a command run in three different places should give one deterministic result. If just test doesn’t, the environment isn’t actually shared, no matter what the config says.

I haven’t yet found a real case where bundling MCP servers into the devcontainer paid off more than it cost. The agents that work well in this setup are the ones that treat the project like a project, not the ones that expect a custom playground.

Why the Setup Is Worth It

The pattern is small enough to summarize on the back of a napkin:

A devcontainer.json (with a Compose file when you need more than one service) that describes the box.
A flake.nix that describes the contents and runs devcontainer-env export in its shellHook.
A task runner the humans, CI, and agents all use (just, make, whatever — pick one).
devcontainer-env for the host-to-container bridge.
devcontainer-ci for the CI-to-container bridge.

The payoff is that “does it work” stops being three different questions. A green build means a green run and a green agent task at the same time. When something breaks, it breaks for everyone at once. That feels worse than it is. The alternative is that it breaks for one audience and you don’t find out for two weeks.

The cheapest way to keep quality high across audiences is to not have separate environments for them. Every fork — a separate CI image, a separate agent setup script, a separate “developer onboarding” doc that drifts from reality — is a bet that you can keep two things in sync forever. You can’t. So don’t fork.

That’s the bet I’m making with this blog and with the tooling I’m building around it. A few months in, the setup has paid for itself in evenings I didn’t spend re-explaining the environment to one audience or another.

Last updated on May 20, 2026

Integrating Bounded Contexts