Chaos Engineering — QGTM
Chaos tests live in tests/chaos/. They simulate failure modes and verify
the system degrades gracefully.
Scenarios
| # | Scenario | Test File | Frequency |
|---|---|---|---|
| 1 | Broker disconnect | test_chaos_scenarios.py | Every CI run |
| 2 | Data feed dropout | test_chaos_scenarios.py | Every CI run |
| 3 | Clock skew | test_chaos_scenarios.py | Every CI run |
| 4 | Redis unavailability | test_chaos_scenarios.py | Every CI run |
| 5 | Audit log tampering | test_chaos_scenarios.py | Every CI run |
| 6 | Capacity breach | test_chaos_scenarios.py | Every CI run |
Staging Chaos (Chaos Mesh)
When deployed on K8s, use Chaos Mesh for infrastructure-level chaos:
# Install Chaos Mesh
kubectl apply -f infra/k8s/monitoring/chaos-mesh.yml
# Run broker disconnect experiment
kubectl apply -f infra/k8s/chaos/broker-disconnect.yml
# Run data feed dropout
kubectl apply -f infra/k8s/chaos/data-feed-dropout.yml
# Run node kill
kubectl apply -f infra/k8s/chaos/node-kill.yml
DR Drill Schedule
| Drill | Cadence | Last Run | Next Run |
|---|---|---|---|
| Kill-switch fire drill | Monthly | -- | TBD |
| Reconciliation drill | Monthly | -- | TBD |
| Watchdog drill | Monthly | -- | TBD |
| Failover drill | Quarterly | -- | TBD |
| Full DR | Quarterly | -- | TBD |