Skip to content

Incident Response Runbook

Severity Levels

Level Description Response Time Examples
SEV-1 Active money loss or data breach Immediate Kill switch triggered, unauthorized access
SEV-2 Degraded trading capability < 30 min Broker API down, data feed stale
SEV-3 Non-critical service issue < 4 hours Web UI down, signal delay, bot offline
SEV-4 Minor issue, no impact Next business day Cosmetic bug, slow dashboard

SEV-1: Active Money Loss

  1. Trigger kill switch — see kill_switch.md
  2. Alert all team members
  3. Verify positions are flat
  4. Investigate root cause
  5. Do NOT re-enable trading until root cause identified and fixed
  6. Write postmortem within 24 hours

SEV-2: Degraded Trading

  1. Check broker status: https://status.alpaca.markets
  2. Check data feeds: EIA, FRED, Alpaca websocket
  3. If broker is down → see broker_outage.md
  4. If data feed is stale → switch to backup provider
  5. If strategies are producing anomalous signals → pause signal generation

Communication Template

[QGTM INCIDENT — SEV-{N}]
What: {brief description}
Impact: {who/what is affected}
Status: {investigating | mitigating | resolved}
Next update: {time}