Skip to content

Disaster recovery

When things go badly wrong, the speed of recovery depends entirely on how you set things up beforehand. This page covers three failure modes, ordered by severity.

SimpleDeploy also mirrors every user-editable setting (users, alert rules, backup configs, registries, webhooks) to YAML sidecar files on disk. A wiped database can be recovered from those files without a prior backup. See Config sidecars and sidecar-based recovery for the full procedure.

  • RPO (recovery point objective) = how much data you can afford to lose. This is driven by backup frequency. Hourly backups = ~1h RPO.
  • RTO (recovery time objective) = how long you can afford to be down. This is driven by restore time: download backup + provision host + restore = your floor.
SetupTypical RPOTypical RTO
Daily DB backup, no app backup24hhours to days
Hourly DB + daily app volume backup1-24h1-2h
15-min DB + hourly volume backup, hot standby15-60mminutes

Most small ops can hit 1h RPO / 1h RTO with hourly backups to S3 and a documented restore runbook.

The process died but the host is fine. Apps may keep running (Docker is independent) but the dashboard, proxy, and metrics collection are down.

Terminal window
sudo systemctl status simpledeploy
journalctl -u simpledeploy -n 200 --no-pager
df -h /var/lib/simpledeploy # disk full?
free -m # OOM?

Common causes:

  • Disk full (DB cannot write WAL). Free space, restart.
  • Out of memory (kernel OOM killed it). Check dmesg | grep -i kill. Add swap or upgrade RAM.
  • Corrupt DB after power loss. SQLite is WAL-mode and survives most crashes; if it does not, restore from the latest DB backup.
  • Bad config after edit. simpledeploy validate --config /etc/simpledeploy/config.yaml.
Terminal window
sudo systemctl restart simpledeploy
journalctl -u simpledeploy -f

If it dies again immediately, check the last error in the journal. Do not loop on restart; fix the root cause.

Hardware failure, accidental termination, region outage. You need to rebuild on a new host.

Same OS, same arch as before. Restore your usual hardening (firewall, SSH keys, user accounts).

Same version as the lost host:

Terminal window
# apt
sudo apt install simpledeploy=1.2.0
# Or download binary
curl -L https://github.com/vazra/simpledeploy/releases/download/v1.2.0/simpledeploy-linux-amd64 \
-o /usr/local/bin/simpledeploy && chmod +x /usr/local/bin/simpledeploy

Do not start the service yet.

Update A/AAAA records to point at the new host. Do this early so DNS has time to propagate.

Download the latest backup from your off-host target (S3, SFTP). Place it at the configured data_dir:

Terminal window
sudo mkdir -p /var/lib/simpledeploy
sudo cp simpledeploy-2026-04-15.db /var/lib/simpledeploy/simpledeploy.db
sudo chown -R simpledeploy:simpledeploy /var/lib/simpledeploy
sudo chmod 0600 /var/lib/simpledeploy/simpledeploy.db

Recreate /etc/simpledeploy/config.yaml with the same master_secret as before. This is non-negotiable. Without it, encrypted registry credentials and JWT signing keys cannot be recovered. Keep master_secret in a password manager separate from the host.

Terminal window
sudo systemctl start simpledeploy

The reconciler reads the apps_dir and starts pulling images. Apps with no persistent data come up immediately. Stateful apps need their volumes restored before they will be useful.

For each stateful app, restore the latest volume/database backup according to the strategy used:

  • Postgres backup: see Restoring app backups for pg_restore flow.
  • Volume backup: extract the tarball into the app’s named volume directory.
Terminal window
# Volume restore example
docker run --rm -v myapp_data:/restore -v $(pwd):/backup alpine \
tar xzf /backup/myapp-2026-04-15.tar.gz -C /restore
  • Dashboard loads
  • Each app reachable on its public domain
  • Sample data present in each app
  • New backup runs cleanly (close the loop)

A deploy went out, the app does not start or behaves badly. Other apps and SimpleDeploy itself are fine.

App detail page -> Versions tab -> select the previous deploy -> Rollback. SimpleDeploy redeploys the prior compose.yaml and waits for containers to be healthy.

Terminal window
simpledeploy versions list myapp
simpledeploy rollback myapp --version <hash>

If the rollback itself fails (rare), edit the compose file directly:

Terminal window
sudo vim /etc/simpledeploy/apps/myapp/compose.yaml
# Restore the last-known-good content from your git repo or backup
sudo systemctl reload simpledeploy

The reconciler watches apps_dir and reapplies on change.

If the bad deploy also corrupted data (a migration gone wrong), follow the app data restore steps from Scenario 2 for just that app.

Run a full disaster recovery drill once per quarter:

  1. Spin up a throwaway VPS.
  2. Restore the latest DB and one app from backups.
  3. Time the whole process. If it took longer than your RTO, fix the gap.
  4. Document deviations and update this runbook.

A backup you have never restored is theoretical.