Troubleshooting
Start here when something’s not working. If this page doesn’t cover your case, check the FAQ or open a GitHub Discussion.
Agent won’t connect
Section titled “Agent won’t connect”Symptom
Section titled “Symptom”Host shows Offline or Connecting… indefinitely.
Checklist
Section titled “Checklist”- Agent service running?
Terminal window systemctl status dockmesh-agentjournalctl -u dockmesh-agent -n 100 - Can the agent resolve the server DNS?
Terminal window # On the agent hostgetent hosts dockmesh.example.com - Can it reach the server port?
Terminal window openssl s_client -connect dockmesh.example.com:8443 -servername dockmesh.example.com - Is the certificate valid? If you rotated the CA, the agent must re-enrol. Get a fresh enrolment token from Agents → host → Rotate enrolment token, then re-run the install one-liner on the agent host (it overwrites the existing cert):
Terminal window curl -fsSL "https://<server>/install/agent.sh?token=<new-token>" | sudo bash - Clock skew? mTLS is sensitive to clock drift. Both sides should run NTP:
Terminal window timedatectl status
Common root causes
Section titled “Common root causes”- Firewall rule change blocking outbound 8443
- TLS cert expired on server (uncommon, auto-renews if ACME)
- Agent revoked on the server (Agents page shows a red
revokedbadge) - Network split between server and agent subnets
Stack deploy fails
Section titled “Stack deploy fails”Symptom
Section titled “Symptom”Deploy logs show an error and the stack status goes to error.
Check the log output first
Section titled “Check the log output first”The streaming deploy log has the real cause. Common ones:
pull access denied — image is private and the registry credentials aren’t configured. See Images → Registry auth.
port is already allocated — another container is using the host port. Find it: Containers → filter by port. Either stop the existing container or change the port in the new stack.
driver failed programming external connectivity — usually means the host ran out of available ports in the ephemeral range, or iptables is misconfigured. Restart the Docker daemon on that host.
network <name> declared as external, but could not be found — the external network isn’t there. Create it first (Networks → New network) or remove external: true.
no space left on device — the host disk is full. Usually /var/lib/docker — prune images/volumes via dockmesh or clean up host logs.
SSO login fails
Section titled “SSO login fails”Symptom
Section titled “Symptom”Clicking the SSO button sends you to the IdP, you log in, come back, and see “Authentication failed” or get bounced to the login page.
Checklist
Section titled “Checklist”- Redirect URI matches exactly?
The URI in your IdP must match
<your-dockmesh-url>/api/v1/auth/oidc/<slug>/callbackcharacter-for-character — where<slug>is the provider’s slug from the Authentication page.httpvshttps, trailing slash, port, slug — all must match. The exact URI is shown at the top of the provider form in the UI, copy it from there. - Clock skew? OIDC tokens have short expiry (usually 60s). If server and IdP clocks differ by more than that, tokens are rejected.
- Group claim present? If you use group mappings, the ID token must include the
groupsclaim. Some IdPs require enabling “groups scope” explicitly. - Logs on the dockmesh server:
Look for specific error like
Terminal window journalctl -u dockmesh | grep -i oidcinvalid token signature,missing claim,discovery failed.
Slow UI
Section titled “Slow UI”Symptom
Section titled “Symptom”Pages take seconds to load.
Diagnose
Section titled “Diagnose”- Server load?
Terminal window top # check dockmesh CPU/memiostat # check disk wait - Database size?
If it’s > 1 GB, consider enabling audit log retention (see Audit Log).
Terminal window ls -lh /var/lib/dockmesh/data/dockmesh.db # Linux defaultls -lh /usr/local/var/dockmesh/data/dockmesh.db # macOS default
Common fixes
Section titled “Common fixes”- Vacuum the SQLite DB if fragmentation is high:
Terminal window systemctl stop dockmeshsqlite3 /var/lib/dockmesh/data/dockmesh.db "VACUUM;"systemctl start dockmesh - Reduce stats retention in Settings if disk I/O is the bottleneck.
Backup fails
Section titled “Backup fails”Symptom
Section titled “Symptom”Backup job shows failed.
- Job log — click the failed run, read the error
- Target still reachable? — test in Backups → Targets → [target] → Test connection
- Disk space on target — SFTP/NAS with a full disk silently fails
- Encryption passphrase known? — restore tests require it; rotating it orphans old backups
Common errors
Section titled “Common errors”dial tcp ... i/o timeout— target host is unreachable (firewall? DNS?)permission denied— credentials have read but not write access on targetpre-backup hook exited 1— the hook script failed (check the hook command/image)
Stack migration fails
Section titled “Stack migration fails”Symptom
Section titled “Symptom”Migration aborts partway, stack is back on source host.
Diagnose
Section titled “Diagnose”- Pre-flight — did any check fail? Volume size mismatch is common.
- Network — bandwidth between source and destination; migrations of 100+ GB volumes can take hours on slow links
- Destination disk full mid-transfer — pre-flight checks free space, but if something else fills it up mid-transfer, migration aborts
Automatic rollback should leave you in the starting state. If it doesn’t, manual cleanup against the on-disk stack tree (one directory per stack, no host subdir):
# On the source host — restart the stack from its compose filedocker compose -f /var/lib/dockmesh/stacks/<stack>/compose.yaml up -d
# On the destination host — tear down whatever the migration left behinddocker compose -f /var/lib/dockmesh/stacks/<stack>/compose.yaml downAlerts not firing
Section titled “Alerts not firing”- Rule enabled? — Alerts → Rules → row’s
enabledtoggle - Rule muted? — same row, check
muted_untilin the future (mutes are per-rule, not global) - Channel working? — Alerts → Channels → row → Send test
- Cooldown still active? — a recent fire on the same rule suppresses re-notify for
cooldown_seconds - Container filter actually matches? — the rule’s
container_filterglob is run against container names; double-checkpaperless-*etc. matches whatdocker psshows on the affected host
Logs aren’t streaming
Section titled “Logs aren’t streaming”Symptom
Section titled “Symptom”Open Container → Logs, nothing shows up or stops after a few seconds.
- Click Reconnect — WebSocket may have dropped
- Check agent version on the host (old agents had a streaming bug fixed in 1.0.0-beta.3)
- If behind a corporate proxy, WebSocket might be stripped — contact your network admin
Can’t log in
Section titled “Can’t log in”Symptom
Section titled “Symptom”The login page rejects your credentials, or returns:
account temporarily locked — try again in N minutes
Five failed login attempts in a row trigger a 15-minute lockout per user (default — configurable via auth.lockout_max_attempts and auth.lockout_duration_minutes). This usually comes from:
- Browser autofill replaying a stale saved password for the same URL
- Copy-paste from a password manager that got the wrong entry
- An actual forgotten password
- Someone on your network probing with wrong credentials (rare for homelab, more relevant on public-internet deploys)
-
Wait 15 minutes — the lockout is time-based, no admin action needed. The login error tells you how long is left.
-
If you know the password but the lockout is annoying:
Terminal window sudo dockmesh admin unlock --user adminClears the lockout without touching the password.
-
If you forgot the password:
Terminal window sudo dockmesh admin reset-password --user admin --password 'NewSecure#2026'This rewrites the password hash only — it does not clear an active lockout. If the account is also locked, run
sudo dockmesh admin unlock --user adminafterwards (or wait the lockout out). -
If the login page rejects you silently (not a lockout error):
- Delete the saved password for the dockmesh URL in your browser’s password manager, then type the password by hand
- Try an Incognito/Private window — rules out autofill + cookie issues
Prevention
Section titled “Prevention”- Set a strong, memorable password you type rather than auto-fill
- Tune the threshold up (defaults: 5 attempts / 15 min lockout) under Authentication → Password policy if you find it too strict — the settings are
auth.lockout_max_attemptsandauth.lockout_duration_minutes
Getting help
Section titled “Getting help”If none of the above fixes your issue:
- GitHub Discussions — searchable, other users can help, answers benefit everyone
- GitHub Issues — for bugs (include dockmesh version, OS, minimal reproduction)
- Security issues only:
security@dockmesh.dev
Always include:
- dockmesh version (
dockmesh --version) - OS + Docker version
- Relevant log snippets (journalctl or in-UI logs)
- Steps to reproduce
See also
Section titled “See also”- FAQ — common conceptual questions
- Hardening — preventive measures
- Upgrade Guide — if the issue started after an upgrade