Disaster recovery
When things go badly wrong, the speed of recovery depends entirely on how you set things up beforehand. This page covers three failure modes, ordered by severity.
SimpleDeploy also mirrors every user-editable setting (users, alert rules, backup configs, registries, webhooks) to YAML sidecar files on disk. A wiped database can be recovered from those files without a prior backup. See Config sidecars and sidecar-based recovery for the full procedure.
RTO and RPO targets
Section titled “RTO and RPO targets”- RPO (recovery point objective) = how much data you can afford to lose. This is driven by backup frequency. Hourly backups = ~1h RPO.
- RTO (recovery time objective) = how long you can afford to be down. This is driven by restore time: download backup + provision host + restore = your floor.
| Setup | Typical RPO | Typical RTO |
|---|---|---|
| Daily DB backup, no app backup | 24h | hours to days |
| Hourly DB + daily app volume backup | 1-24h | 1-2h |
| 15-min DB + hourly volume backup, hot standby | 15-60m | minutes |
Most small ops can hit 1h RPO / 1h RTO with hourly backups to S3 and a documented restore runbook.
Scenario 1: SimpleDeploy crashes
Section titled “Scenario 1: SimpleDeploy crashes”The process died but the host is fine. Apps may keep running (Docker is independent) but the dashboard, proxy, and metrics collection are down.
Diagnose
Section titled “Diagnose”sudo systemctl status simpledeployjournalctl -u simpledeploy -n 200 --no-pagerdf -h /var/lib/simpledeploy # disk full?free -m # OOM?Common causes:
- Disk full (DB cannot write WAL). Free space, restart.
- Out of memory (kernel OOM killed it). Check
dmesg | grep -i kill. Add swap or upgrade RAM. - Corrupt DB after power loss. SQLite is WAL-mode and survives most crashes; if it does not, restore from the latest DB backup.
- Bad config after edit.
simpledeploy validate --config /etc/simpledeploy/config.yaml.
Restart
Section titled “Restart”sudo systemctl restart simpledeployjournalctl -u simpledeploy -fIf it dies again immediately, check the last error in the journal. Do not loop on restart; fix the root cause.
Scenario 2: Whole VPS lost
Section titled “Scenario 2: Whole VPS lost”Hardware failure, accidental termination, region outage. You need to rebuild on a new host.
Step 1: Provision the replacement
Section titled “Step 1: Provision the replacement”Same OS, same arch as before. Restore your usual hardening (firewall, SSH keys, user accounts).
Step 2: Install SimpleDeploy
Section titled “Step 2: Install SimpleDeploy”Same version as the lost host:
# aptsudo apt install simpledeploy=1.2.0
# Or download binarycurl -L https://github.com/vazra/simpledeploy/releases/download/v1.2.0/simpledeploy-linux-amd64 \ -o /usr/local/bin/simpledeploy && chmod +x /usr/local/bin/simpledeployDo not start the service yet.
Step 3: Restore DNS
Section titled “Step 3: Restore DNS”Update A/AAAA records to point at the new host. Do this early so DNS has time to propagate.
Step 4: Restore the system DB
Section titled “Step 4: Restore the system DB”Download the latest backup from your off-host target (S3, SFTP). Place it at the configured data_dir:
sudo mkdir -p /var/lib/simpledeploysudo cp simpledeploy-2026-04-15.db /var/lib/simpledeploy/simpledeploy.dbsudo chown -R simpledeploy:simpledeploy /var/lib/simpledeploysudo chmod 0600 /var/lib/simpledeploy/simpledeploy.dbStep 5: Restore config
Section titled “Step 5: Restore config”Recreate /etc/simpledeploy/config.yaml with the same master_secret as before. This is non-negotiable. Without it, encrypted registry credentials and JWT signing keys cannot be recovered. Keep master_secret in a password manager separate from the host.
Step 6: Start the service
Section titled “Step 6: Start the service”sudo systemctl start simpledeployThe reconciler reads the apps_dir and starts pulling images. Apps with no persistent data come up immediately. Stateful apps need their volumes restored before they will be useful.
Step 7: Restore app data volumes
Section titled “Step 7: Restore app data volumes”For each stateful app, restore the latest volume/database backup according to the strategy used:
- Postgres backup: see Restoring app backups for
pg_restoreflow. - Volume backup: extract the tarball into the app’s named volume directory.
# Volume restore exampledocker run --rm -v myapp_data:/restore -v $(pwd):/backup alpine \ tar xzf /backup/myapp-2026-04-15.tar.gz -C /restoreStep 8: Verify
Section titled “Step 8: Verify”- Dashboard loads
- Each app reachable on its public domain
- Sample data present in each app
- New backup runs cleanly (close the loop)
Scenario 3: Bad deploy bricked an app
Section titled “Scenario 3: Bad deploy bricked an app”A deploy went out, the app does not start or behaves badly. Other apps and SimpleDeploy itself are fine.
Quick rollback via UI
Section titled “Quick rollback via UI”App detail page -> Versions tab -> select the previous deploy -> Rollback. SimpleDeploy redeploys the prior compose.yaml and waits for containers to be healthy.
Quick rollback via CLI
Section titled “Quick rollback via CLI”simpledeploy versions list myappsimpledeploy rollback myapp --version <hash>Manual override
Section titled “Manual override”If the rollback itself fails (rare), edit the compose file directly:
sudo vim /etc/simpledeploy/apps/myapp/compose.yaml# Restore the last-known-good content from your git repo or backupsudo systemctl reload simpledeployThe reconciler watches apps_dir and reapplies on change.
Restore app data
Section titled “Restore app data”If the bad deploy also corrupted data (a migration gone wrong), follow the app data restore steps from Scenario 2 for just that app.
Drill schedule
Section titled “Drill schedule”Run a full disaster recovery drill once per quarter:
- Spin up a throwaway VPS.
- Restore the latest DB and one app from backups.
- Time the whole process. If it took longer than your RTO, fix the gap.
- Document deviations and update this runbook.
A backup you have never restored is theoretical.