Signals Outage Runbook
Detection
- Signal delivery latency > 5 minutes
- Discord bot offline or not posting
- Subscriber complaints
- Monitoring alert: signal_delivery_failures > 3
Response
- Check signal service health:
curl https://api.qgtm.ai/signals/health - Check Discord bot status: is it online in the server?
- Check Redis: are signals being published to the stream?
- Check Stripe webhooks: are subscription events processing?
Common Causes
| Symptom | Likely Cause | Fix |
|---|---|---|
| Bot offline | Token expired or rate limited | Restart bot, check token |
| Signals not generating | Strategy service crashed | Check logs, restart |
| Signals generating but not delivering | Redis stream consumer lag | Check consumer group, restart |
| Delayed signals | API latency | Check Fly.io metrics, scale up |
Communication
Post in Discord #announcements:
Signal delivery is currently delayed. We are investigating and will update within 30 minutes. Trading signals are still being generated — delivery will catch up once resolved.
Recovery
- Fix root cause
- Replay any missed signals from Redis stream
- Post recovery notice in Discord
- Review SLA impact for institutional tier subscribers