Debugging with an isolated reproduction repo
When a bug only appears in one environment (CI passes locally, or the opposite), resist the urge to keep patching the full system. Pull the broken piece into an isolated repo and run it bare. The smaller the surface, the faster the root cause emerges.
When to reach for this
Section titled “When to reach for this”- A test passes locally and fails in CI (or vice versa) and re-running doesn’t help.
- A multi-layer stack (compose + proxy + deployer + UI) flakes, and you can’t tell which layer is lying.
- You suspect an environment-specific issue (runner speed, kernel, Docker version) but can’t prove it.
- A bug fix worked in dev but regressions hit CI only — you need a standing regression test outside the main pipeline.
Pattern: minimal public repo + CI matrix
Section titled “Pattern: minimal public repo + CI matrix”- Render the broken artifact verbatim, stripping everything the main system adds. For SimpleDeploy templates the tooling is
e2e/tools/render-template.jswhich emits a standalone compose file withsimpledeploy.*labels removed and endpoint ports mapped to the host. For other subsystems, write a tiny script that does the equivalent: capture the exact input the broken code receives. - Build a matrix of variants that isolates each hypothesized cause. One cell per hypothesis. Keep the cells as close to minimal-and-identical as possible so only the hypothesis variable differs.
- Push to a public repo with a GitHub Actions workflow that exercises each cell in parallel and dumps logs on failure. Public repos get free runners and are easy to share with collaborators.
- Compare cell outcomes. The cell that fails tells you which piece of the original system is responsible.
Example: vazra/tpl-ci-repro (Nov 2026)
Section titled “Example: vazra/tpl-ci-repro (Nov 2026)”Context: SimpleDeploy’s templates e2e was marking n8n-postgres and umami-postgres as probe-failing only in CI. Raw docker compose up on my Mac worked for both.
Matrix:
n8n/— the rendered compose, as-isn8n-injected/— the rendered compose PLUS SimpleDeploy’sInjectSharedNetworkrewrite appliedumami/,umami-injected/— same pair for Umami
Workflow probed each cell’s endpoint for up to 360s and dumped container logs on failure.
Outcome:
n8n: reachable in 9sumami: reachable in 15sn8n-injected: UNREACHABLE, logs showedgetaddrinfo EAI_AGAIN dbumami-injected: UNREACHABLE, same pattern
Root cause isolated to the injection rewrite in a single 8-minute CI run. Fix + regression test landed in #9.
Checklist
Section titled “Checklist”- The reproduction is minimal. If a variable doesn’t affect the outcome, strip it.
- Each matrix cell isolates exactly one hypothesis.
- Logs from failing cells are dumped automatically (don’t rely on the operator remembering to fetch them).
- The repo is kept around long enough to serve as a regression test — until the fix has survived a release cycle. Then delete it.
- Findings are written up in
docs/architecture/or the relevant area doc, linking back to the PR that fixed the root cause.
Template composes, API payloads, migration SQL, Go programs calling a single function — anything that can be captured as a small input can be isolated this way.