Skip to content

Hardening Guide

This is a checklist for running dockmesh in production or any multi-user environment. Work through it once at setup, then revisit quarterly.

The dockmesh binary should be owned by root and writable only by root:

Terminal window
chown root:root /usr/local/bin/dockmesh
chmod 755 /usr/local/bin/dockmesh

The data directory (DOCKMESH_DB_PATH parent) should be root-owned with 700 permissions:

Terminal window
chown -R root:root /opt/dockmesh
chmod 700 /opt/dockmesh
chmod 600 /opt/dockmesh/data/*.db

The SQLite database contains the CA private key, session tokens, and encrypted secrets. Do not let it be world-readable.

Add these directives to /etc/systemd/system/dockmesh.service:

[Service]
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
PrivateTmp=true
PrivateDevices=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
ReadWritePaths=/opt/dockmesh /var/run/docker.sock
RestrictNamespaces=true
RestrictRealtime=true
LockPersonality=true
MemoryDenyWriteExecute=true

Disable the default admin/admin on first login by setting a strong password. Better: enable SSO and delete the local admin after verifying SSO works.

Enforce 2FA for all admin accounts (Settings → Authentication → 2FA policy → Required for admins).

Enable the embedded Caddy reverse proxy (see Reverse Proxy) and bind dockmesh to 127.0.0.1:8080. Caddy terminates TLS on 443.

If you use an external load balancer (Cloudflare, AWS ALB, nginx), set DOCKMESH_HTTP_ADDR=127.0.0.1:8080 so dockmesh never listens on a public IP directly.

The agent protocol is mTLS by default. Do not disable mTLS. Rotate the CA if you suspect compromise:

Terminal window
# Backup first
cp /opt/dockmesh/data/dockmesh.db /opt/dockmesh/data/dockmesh.db.bak
# Rotate
dockmesh ca rotate --reissue-all-agents

All agents re-enroll on next connect. Old certs are revoked.

On the server host:

PortDirectionSourcePurpose
443inPublic (if UI is internet-facing) or VPN subnetHTTPS UI
8443inAgent IPs onlyAgent mTLS
80inPublicACME challenge (only if using auto-TLS)
22inAdmin IPs onlySSH

Block everything else. Example ufw:

Terminal window
ufw default deny incoming
ufw allow from <VPN-subnet> to any port 443
ufw allow from <agent-subnet> to any port 8443
ufw allow from <admin-ip> to any port 22
ufw enable

Agents make outbound-only connections. No inbound ports needed. If you have an inbound firewall rule for them, remove it.

If stacks are Git-backed, don’t commit plaintext passwords. Options:

  • SOPS with age encryption — files are encrypted in Git, decrypted at deploy time
  • Docker native secrets with an external store (Vault, AWS Secrets Manager)
  • dockmesh env var secrets — encrypted at rest in the dockmesh DB, never in Git

Rotate these on a schedule:

SecretRotation
Agent mTLS certsAuto, every 30 days
API tokensEvery 90 days (force via UI)
Session tokensExpire after 15 min (JWT)
SSO client secretEvery 12 months
SMTP passwordOn staff change

Use age-encrypted backups (Backup docs). Store the passphrase in a password manager separate from the dockmesh instance.

A backup you haven’t restored isn’t a backup. Schedule a quarterly test restore to a scratch host.

At least one backup target should be off the same host (and ideally off the same provider). Local backup + S3/B2 is the common pattern.

Enrollment tokens are powerful. Treat them like root SSH keys:

  • One-time use (dockmesh enforces this)
  • Transmit over encrypted channel (SSH, 1Password shared vault, Signal)
  • Never commit to Git or pipeline config
  • Rotate the server’s enrollment-signing key annually

dockmesh doesn’t replace Docker security best practices:

  • Run containers as non-root where possible (USER in Dockerfile)
  • Drop capabilities (cap_drop: [ALL], add only what’s needed)
  • Use read-only root filesystem (read_only: true + explicit write volumes)
  • Set resource limits (cpus, mem_limit) on every container
  • Use no-new-privileges: true in compose

The image-scanner in dockmesh catches known CVEs but not runtime misconfigurations. Separate runtime scanner (Falco, Tracee) for that.

  • Enable the Audit Log webhook to ship events to your SIEM
  • Alert on: failed SSO attempts, mTLS handshake failures, audit chain breaks
  • Review the audit log weekly for unexpected admin actions

Quarterly:

  • Who has Admin? Why?
  • Which API tokens exist? Who owns each? Still needed?
  • Expired or unused role assignments
  • Hosts that haven’t connected in 30 days (stale agents)
  • Stacks running on EOL’d base images