Skip to content

Dead-Man Switch Independence (P0-7)

What exists today

Component PID Role
DeadManSwitch in daemon Same as qgtm-daemon 300s heartbeat timeout; trips in-process
qgtm_live/watchdog.py Invoked from daemon task Flatten on trip
qgtm-watchdog.service Host systemd (.venv) Polls telemetry API; flatten via /oms/flatten-all
GitHub Actions daemon-watchdog.yml GH runner on droplet External telemetry poll + Telegram (deduped)

P0-7 — standalone watchdog (deployed)

# /etc/systemd/system/qgtm-watchdog.service
ExecStart=/opt/trading/.venv/bin/python -m qgtm_live.watchdog --standalone

Installed automatically by Deploy API (self-hosted) after each green deploy. Independent of the daemon Docker container: if the daemon stops heartbeating, the host process still polls http://127.0.0.1:8000/api/v1/daemon/telemetry and POSTs flatten when heartbeat is stale or telemetry is offline.

Operator enable (manual / new droplet)

cd /opt/trading
sudo install -m 644 infra/systemd/qgtm-watchdog.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now qgtm-watchdog
sudo systemctl status qgtm-watchdog
journalctl -u qgtm-watchdog -f

Requires QGTM_API_KEY in /opt/trading/.env (same key the API accepts for telemetry + flatten).

Local test

python -m pytest tests/test_standalone_watchdog.py -q
# Dry-run classification only — do not flatten prod without intent
python -c "from qgtm_live.watchdog import classify_telemetry_problems; ..."

Backstop stack (defense in depth)

  1. In-daemon DeadManSwitch + watchdog_loop (fastest path)
  2. qgtm-watchdog.service on droplet (co-located, ~10s poll)
  3. GH Actions daemon-watchdog every 15m (off-process alert)
  4. Deploy deep-health gate (blocks bad deploys)