Upgrade Guide
dockmesh follows semantic versioning. Minor and patch releases are drop-in upgrades. Major versions may require manual steps — always check the changelog before a major upgrade.
The basic flow
Section titled “The basic flow”- Read the changelog for your target version
- Back up the data directory
- Stop the dockmesh server
- Replace the binary
- Start the dockmesh server
- Verify agents reconnect
- Verify all stacks are healthy
Typical total downtime: 30-60 seconds for the UI. Your containers keep running the whole time — the dockmesh server being down doesn’t affect Docker.
Step 1 — Backup
Section titled “Step 1 — Backup”# Stop-the-world snapshot of the database + CAsystemctl stop dockmeshtar czf /opt/dockmesh-backup-$(date +%Y%m%d).tar.gz /opt/dockmesh/data/Copy the tarball somewhere off the host (S3, another server, your workstation).
If the upgrade fails, restore by extracting the tarball and using the previous binary.
Step 2 — Download new binary
Section titled “Step 2 — Download new binary”# Check current version/usr/local/bin/dockmesh --version
# Download new binary to a temp pathcurl -fsSL https://get.dockmesh.dev/v1.2.0/linux-amd64 -o /tmp/dockmesh-new
# Verify checksumsha256sum /tmp/dockmesh-new# Compare to the published checksum in the release notes
# Make executablechmod +x /tmp/dockmesh-newStep 3 — Swap
Section titled “Step 3 — Swap”# Move old binary asidemv /usr/local/bin/dockmesh /usr/local/bin/dockmesh.old
# Install newmv /tmp/dockmesh-new /usr/local/bin/dockmesh
# Startsystemctl start dockmesh
# Verify logsjournalctl -u dockmesh -fThe server should log:
dockmesh v1.2.0 startingDatabase migration from 15 -> 17 completeHTTP listening on :8080Agent listener on :8443 (mTLS)If migrations fail, the server exits. Roll back:
systemctl stop dockmeshmv /usr/local/bin/dockmesh.old /usr/local/bin/dockmesh# Restore DB if migrations partially rantar xzf /opt/dockmesh-backup-*.tar.gz -C /systemctl start dockmeshStep 4 — Agent auto-upgrade
Section titled “Step 4 — Agent auto-upgrade”When an agent reconnects after the server starts, it reports its version. If the server’s new version ships an updated agent protocol, the server instructs the agent to download and hot-swap its binary.
Agent hot-swap:
- Server sends
upgrademessage with the new agent URL + checksum - Agent downloads, verifies, writes new binary atomically
- Agent re-executes itself with the new binary (open connections survive via the systemd socket fd)
- Server confirms new version in the Hosts page
Typical agent upgrade: < 10 seconds, no stack restart.
Agents on hosts that can’t reach the download URL (air-gapped) show a “pending upgrade” banner. You upgrade them manually:
# On the air-gapped hostcurl -fsSL https://get.dockmesh.dev/v1.2.0/agent-linux-amd64 -o /usr/local/bin/dockmesh-agent.newchmod +x /usr/local/bin/dockmesh-agent.newmv /usr/local/bin/dockmesh-agent.new /usr/local/bin/dockmesh-agentsystemctl restart dockmesh-agentStep 5 — Verify
Section titled “Step 5 — Verify”Quick checks:
- Dashboard loads, shows expected host count
- All hosts are Online (Hosts page, status column)
- No unexpected alerts fired during the upgrade window
- Spot-check one stack — open it, deploy something minor, verify it works
- Check the audit log for migration events (new version often adds new audit categories)
Rolling upgrades on the agent side
Section titled “Rolling upgrades on the agent side”If you have many agents and want to upgrade them in waves instead of all at once, configure the fleet-wide policy on the Agents page (top panel):
- Manual (default) — agents stay on their current version until an admin clicks Upgrade on each row. Safest; what existing installs get on first boot after this feature shipped.
- Auto — every drifted agent is sent a
FrameReqAgentUpgradeon the next evaluator tick. Good for homelab / small fleets where blast-radius is small. - Staged — the controller upgrades a configurable percentage of drifted agents per tick (default 10%), so you can watch the first wave before the rest follow.
The evaluator runs every 60 seconds. When you change the mode or bump the server version, hit Run now on the policy panel to trigger an immediate evaluation instead of waiting for the next tick.
Every drifted agent also gets a dedicated Upgrade button on its row in the Agents page — useful regardless of mode when a single host fell behind after a network blip.
What “drifted” means
Section titled “What “drifted” means”The evaluator compares each connected agent’s Hello.Version (reported on every WS connect) to pkg/version.Version compiled into the server binary. String equality — no semver comparison. That keeps the logic simple: after a server upgrade, every agent is “drifted” until it reports the new version in its next hello, and normal evaluation takes over.
Audit trail
Section titled “Audit trail”Every dispatched upgrade is logged with action stack.deploy and a action=agent-upgrade detail field, including the target agent id, the server’s version, and which binary was requested. Policy changes log as agent.upgrade_policy. Manual runs log as agent.upgrade_run.
Downgrading
Section titled “Downgrading”Downgrades are not supported across database migrations. The server refuses to start if the DB schema is newer than the binary expects.
If you must downgrade:
- Stop the server
- Restore the pre-upgrade backup
- Install the older binary
- Start
This loses any changes made after the upgrade.
Major-version upgrades
Section titled “Major-version upgrades”For v1.x → v2.0-style jumps:
- Read the migration notes section of the release post
- Test on a staging dockmesh first (point a throwaway instance at a subset of hosts)
- Plan for 15-30 min downtime
- Have the rollback procedure pre-tested
See also
Section titled “See also”- Changelog — what changed in each version
- Disaster Recovery — if the upgrade catastrophically fails
- Hardening — verify hardening steps still apply after upgrade