Troubleshooting
Each entry follows: symptom -> diagnostic command -> fix.
Deploy hangs at “pulling image”
Section titled “Deploy hangs at “pulling image””Symptom: Deploy progress bar sticks. Logs show Pulling foo/bar:latest... for >5 minutes.
Diagnose:
# Check what compose is doingsudo docker compose -f /etc/simpledeploy/apps/<slug>/compose.yaml pull
# Check registry authsimpledeploy registry listFix:
- Bad image name or tag: correct the
image:line in the compose file. - Private registry: add credentials via Settings -> Registries (or
simpledeploy registry add). - Network issue: confirm DNS works (
getent hosts registry-1.docker.io) and outbound 443 is open. - Rate limited by Docker Hub: log in to a paid account or use a registry mirror.
Deploy reported “Unstable”
Section titled “Deploy reported “Unstable””Symptom: The deploy wizard shows a yellow “Unstable” status. The per-service summary lists one or more services as restarting, unhealthy, or with restarted Nx.
What it means: docker compose up finished without an error, but during the 30-second post-deploy stabilization window one of the containers either kept restarting, reported unhealthy from its healthcheck, or exited. This is almost always an application-level error rather than a SimpleDeploy bug.
Diagnose:
# Watch the offending container's logs (replace slug + service)sudo docker logs simpledeploy-<slug>-<service>-1 --tail 200In the UI: open the app, Logs tab, pick the unhealthy service, scroll to the latest entries. The actual error from the container is usually in the last few lines before the restart.
Common causes:
- Permission errors writing to a volume (
EACCES,Permission denied). The image runs as a non-root user but the named volume was created root-owned. Adduser: "1000:1000"(or whatever UID the image documents) to the service, or use the templated version of the app from the wizard which handles this. - Required env var missing. Container exits with “X is required”. Add the missing env var via the Config tab.
- Port already bound in
port-onlymode if the same host port is requested by another app. - Healthcheck endpoint not yet implemented in your code. Either add the endpoint or relax the healthcheck (longer
start_period).
Fix and redeploy: edit the compose via the Config tab and click Redeploy. The next deploy will re-run the stabilization check.
Deploy fails with “compose file rejected”
Section titled “Deploy fails with “compose file rejected””Symptom: Deploy refused with a validation error.
Diagnose: Read the exact error message in the deploy logs. SimpleDeploy validates compose files and rejects dangerous directives.
Fix: Remove the offending directive. See Security hardening - Deployment safety for the full reject list (privileged, network_mode: host, cap_add: SYS_ADMIN, bind mounts of /etc//proc/docker.sock, etc.).
If you genuinely need one of these (rare), there is no override. File an issue explaining the use case.
TLS certificate fails to issue
Section titled “TLS certificate fails to issue”Symptom: Browser shows cert error or tls: no certificates configured. Logs show ACME errors.
Diagnose:
journalctl -u simpledeploy -n 200 | grep -i acme# Verify DNSdig +short manage.example.com# Verify port 80 reachable from outsidecurl -I http://manage.example.com/.well-known/acme-challenge/testFix:
- DNS not pointing at this host: fix A/AAAA record, wait for TTL.
- Port 80 blocked: open inbound
80/tcpin firewall and any cloud security group. - Let’s Encrypt rate limit (5 certs/week per domain): wait 1 week or use staging endpoint while testing.
- CAA record blocking:
dig CAA example.comshould includeletsencrypt.orgor be empty. tls.emailmissing: required for ACME. Set in config and restart.
”permission denied” on data_dir
Section titled “”permission denied” on data_dir”Symptom: Service fails to start with permission errors writing to /var/lib/simpledeploy.
Diagnose:
ls -la /var/lib/simpledeployps aux | grep simpledeploy # what user is the process running as?Fix:
sudo chown -R simpledeploy:simpledeploy /var/lib/simpledeploysudo chmod 0700 /var/lib/simpledeploysudo chmod 0600 /var/lib/simpledeploy/simpledeploy.dbWebSocket logs not streaming
Section titled “WebSocket logs not streaming”Symptom: Log viewer in the UI shows “Connecting…” forever or disconnects immediately.
Diagnose: Open browser dev tools -> Network -> WS. Look for the failed /api/apps/<slug>/logs connection and read the close code.
Fix:
- Behind Cloudflare with WS disabled: enable WebSockets in Cloudflare dashboard for the management hostname.
- Behind nginx/another proxy: ensure
proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection upgrade;are set. - Origin mismatch: the management UI must be served from the same hostname as the API. Cross-origin WS is rejected by design.
- Idle timeout: connections close after 5 minutes idle. The UI auto-reconnects.
High memory usage
Section titled “High memory usage”Symptom: Server RAM growing over time. top shows large simpledeploy or Docker resident size.
Diagnose:
free -mdocker system dfdocker images | wc -ldu -sh /var/lib/simpledeployFix:
# Clean up unused Docker stuffdocker image prune -afdocker volume prune -fdocker system prune -af --volumes # nuclear option
# Lower metrics retention if DB is huge (see capacity-sizing.md)If SimpleDeploy itself is leaking, capture a profile:
curl http://localhost:8443/debug/pprof/heap > heap.outand open a GitHub issue.
App not reachable from the internet
Section titled “App not reachable from the internet”Symptom: App’s domain returns 404, connection refused, or times out.
Diagnose:
# Is the container actually running?docker ps | grep <app-slug>
# Is it listening on the expected port inside the container?docker exec -it <container> ss -tln
# Does Caddy know about this domain?curl -s http://localhost:2019/config/ | jq '.apps.http.servers'Fix:
- Container not running: redeploy. Check container logs for crash loop.
- Missing endpoint label: add
simpledeploy.endpoint=example.comto the service incompose.yaml. - Wrong port: confirm the service
expose:orports:matches what the app listens on. - DNS not resolving:
dig +short example.comshould match server IP.
429 rate limit hitting the dashboard
Section titled “429 rate limit hitting the dashboard”Symptom: Dashboard or API returns 429 Too Many Requests. Common during scripted use.
Diagnose: Check what is hammering the server. Audit log for repeated requests from one IP.
Fix:
- Login flood (10/min): wait 60s. Check for misconfigured auto-login scripts.
- Per-app rate limit: tune
simpledeploy.ratelimit.*labels on the affected app. - Behind a proxy: set
trusted_proxiesin config so rate limiting uses real client IPs.
Backup failed
Section titled “Backup failed”Symptom: Backup run shows failed in Backups UI.
Diagnose:
# Check the backup run logcurl -H "Authorization: Bearer $SD_API_KEY" \ https://manage.example.com/api/apps/<slug>/backups/runs/<id>Fix:
- Bad S3 credentials: re-enter in Settings -> Backup target.
- S3 bucket not reachable: check region, endpoint URL, network.
- Strategy script crashed: check the run logs for the exact error from
pg_dump/tar. - Disk full on local target: free space or move target to S3.
Forgot admin password
Section titled “Forgot admin password”Diagnose: No need; if you cannot log in, you cannot log in.
Fix:
# Create a new super_admin (will prompt for password)sudo -u simpledeploy simpledeploy users create \ --username recovery \ --role super_admin
# Log in as 'recovery', delete the old account from the UI# Then delete the recovery account or rotate its passwordService won’t start after upgrade
Section titled “Service won’t start after upgrade”See Upgrade and rollback - Rollback. Usually a migration ran that the previous binary does not understand. Restore the pre-upgrade DB backup and downgrade.
Still stuck
Section titled “Still stuck”- Search GitHub issues.
- Open a new issue with: version (
simpledeploy version), OS, last 200 lines ofjournalctl -u simpledeploy, and steps to reproduce.