Skip to content

Chaos Engineering — QGTM

Chaos tests live in tests/chaos/. They simulate failure modes and verify the system degrades gracefully.

Scenarios

# Scenario Test File Frequency
1 Broker disconnect test_chaos_scenarios.py Every CI run
2 Data feed dropout test_chaos_scenarios.py Every CI run
3 Clock skew test_chaos_scenarios.py Every CI run
4 Redis unavailability test_chaos_scenarios.py Every CI run
5 Audit log tampering test_chaos_scenarios.py Every CI run
6 Capacity breach test_chaos_scenarios.py Every CI run

Staging Chaos (Chaos Mesh)

When deployed on K8s, use Chaos Mesh for infrastructure-level chaos:

# Install Chaos Mesh
kubectl apply -f infra/k8s/monitoring/chaos-mesh.yml

# Run broker disconnect experiment
kubectl apply -f infra/k8s/chaos/broker-disconnect.yml

# Run data feed dropout
kubectl apply -f infra/k8s/chaos/data-feed-dropout.yml

# Run node kill
kubectl apply -f infra/k8s/chaos/node-kill.yml

DR Drill Schedule

Drill Cadence Last Run Next Run
Kill-switch fire drill Monthly -- TBD
Reconciliation drill Monthly -- TBD
Watchdog drill Monthly -- TBD
Failover drill Quarterly -- TBD
Full DR Quarterly -- TBD

Running Chaos Tests

pytest tests/chaos/ -v