Incident Response Runbook
Severity Levels
| Level |
Description |
Response Time |
Examples |
| SEV-1 |
Active money loss or data breach |
Immediate |
Kill switch triggered, unauthorized access |
| SEV-2 |
Degraded trading capability |
< 30 min |
Broker API down, data feed stale |
| SEV-3 |
Non-critical service issue |
< 4 hours |
Web UI down, signal delay, bot offline |
| SEV-4 |
Minor issue, no impact |
Next business day |
Cosmetic bug, slow dashboard |
SEV-1: Active Money Loss
- Trigger kill switch — see kill_switch.md
- Alert all team members
- Verify positions are flat
- Investigate root cause
- Do NOT re-enable trading until root cause identified and fixed
- Write postmortem within 24 hours
SEV-2: Degraded Trading
- Check broker status: https://status.alpaca.markets
- Check data feeds: EIA, FRED, Alpaca websocket
- If broker is down → see broker_outage.md
- If data feed is stale → switch to backup provider
- If strategies are producing anomalous signals → pause signal generation
Communication Template
[QGTM INCIDENT — SEV-{N}]
What: {brief description}
Impact: {who/what is affected}
Status: {investigating | mitigating | resolved}
Next update: {time}