Dead-Man Switch Independence (P0-7)
What exists today
| Component | PID | Role |
|---|---|---|
DeadManSwitch in daemon |
Same as qgtm-daemon |
300s heartbeat timeout; trips in-process |
qgtm_live/watchdog.py |
Invoked from daemon task | Flatten on trip |
qgtm-watchdog.service |
Host systemd (.venv) |
Polls telemetry API; flatten via /oms/flatten-all |
GitHub Actions daemon-watchdog.yml |
GH runner on droplet | External telemetry poll + Telegram (deduped) |
P0-7 — standalone watchdog (deployed)
# /etc/systemd/system/qgtm-watchdog.service
ExecStart=/opt/trading/.venv/bin/python -m qgtm_live.watchdog --standalone
Installed automatically by Deploy API (self-hosted) after each green deploy. Independent of the daemon Docker container: if the daemon stops heartbeating, the host process still polls http://127.0.0.1:8000/api/v1/daemon/telemetry and POSTs flatten when heartbeat is stale or telemetry is offline.
Operator enable (manual / new droplet)
cd /opt/trading
sudo install -m 644 infra/systemd/qgtm-watchdog.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now qgtm-watchdog
sudo systemctl status qgtm-watchdog
journalctl -u qgtm-watchdog -f
Requires QGTM_API_KEY in /opt/trading/.env (same key the API accepts for telemetry + flatten).
Local test
python -m pytest tests/test_standalone_watchdog.py -q
# Dry-run classification only — do not flatten prod without intent
python -c "from qgtm_live.watchdog import classify_telemetry_problems; ..."
Backstop stack (defense in depth)
- In-daemon
DeadManSwitch+watchdog_loop(fastest path) qgtm-watchdog.serviceon droplet (co-located, ~10s poll)- GH Actions
daemon-watchdogevery 15m (off-process alert) - Deploy deep-health gate (blocks bad deploys)