Owner: QGTM AI
Last reviewed: 2026-04-12
Next review: 2026-07-12 (quarterly)
Classification: INTERNAL — share with auditors under NDA
1. Access Control
| Resource |
Who |
How Provisioned |
MFA Required |
GitHub repo (QGTMAI/trading) |
@QGTMAI (owner) |
GitHub RBAC |
Yes |
| Production server (DigitalOcean) |
@QGTMAI |
SSH key + Tailscale |
Yes |
| Alpaca broker (paper) |
@QGTMAI |
API key in Vault |
N/A |
| Alpaca broker (live) |
@QGTMAI |
API key in Vault, IP-restricted |
N/A |
| Cloudflare (qgtmai.com) |
@QGTMAI |
SSO |
Yes |
| Discord / Telegram bots |
@QGTMAI |
Bot tokens in Vault |
N/A |
.env / secrets |
Never committed |
.gitignore enforced, pre-commit hook |
N/A |
Principle of least privilege: No service account has broader permissions than needed. Broker keys are scoped to trading only (no withdrawal authority).
2. Change Management
Branch Protection (main)
- Direct push: blocked
- Required reviewers: 1 (general), 2 (risk-critical paths)
- Status checks required:
lint, mypy, test, coverage >= 70%, sbom
- Force push: blocked
- Merge method: squash-and-merge only
CODEOWNERS (enforced paths)
See CODEOWNERS. Two-person review paths:
| Path |
Reason |
qgtm_risk/ |
Kill switch, position limits, circuit breakers |
qgtm_execution/ |
Order routing, broker integration |
qgtm_live/ |
Live daemon, watchdog, reconciliation |
qgtm_backtest/ |
Changes affect all reported metrics |
infra/ |
Deployment, secrets, infrastructure |
.github/ |
CI/CD pipeline definitions |
PR Process
- Branch from
main, prefix: feat/, fix/, refactor/, docs/
- Write failing test first (if applicable)
- Open PR with description, link to issue if any
- CI must pass: lint + mypy strict + full test suite
- CODEOWNERS auto-assigned for review
- Squash merge after approval(s)
- Delete source branch
3. Release Process
Tag (vX.Y.Z)
--> CI: lint + mypy + 1700+ tests
--> SBOM generated (CycloneDX)
--> Container image built + signed (cosign)
--> Canary deploy (1 replica, 10% traffic)
--> Smoke tests pass
--> Full rollout
| Step |
Tool |
Artifact |
| Tagging |
git tag -s vX.Y.Z |
Signed tag |
| SBOM |
syft + CycloneDX |
sbom.json in release assets |
| Image signing |
cosign |
Signature in OCI registry |
| Canary |
K8s Deployment (1 replica) |
Prometheus alert on error rate |
| Full rollout |
K8s rolling update |
Zero-downtime, readiness probes |
Rollback
kubectl rollout undo deployment/qgtm-api
Automatic rollback triggers: error rate > 5% or p99 latency > 200ms during canary window.
4. Two-Person Rule
These actions require approval from two authorized individuals:
| Action |
Method |
Evidence |
| Merge to risk-critical paths |
GitHub CODEOWNERS (2 reviewers) |
PR approval log |
| Production deploy |
Tag + CI + manual approval gate |
GitHub Actions log |
| Broker API key rotation |
Owner + documented witness |
Vault audit log |
| Kill switch override |
Owner + manual confirmation |
Audit log entry |
| Position limit increase |
Owner + risk review documented |
Git commit + PR |
| Database migration (production) |
Owner + reviewed migration script |
PR + deploy log |
5. Model Risk Management (SR 11-7)
Model Inventory
See model_inventory.md for the full register of 18 models.
| Category |
Count |
Examples |
| Signal generation |
10 |
TSMOM, XSMOM, ML ensemble, regime detector |
| Portfolio allocation |
3 |
HRP, risk parity, regime-adaptive |
| Risk estimation |
3 |
Factor model, EVT tail, options Greeks |
| Execution |
2 |
VWAP, TWAP |
Validation Cadence
| Activity |
Frequency |
Owner |
| Backtest re-run (walk-forward) |
Monthly |
@QGTMAI |
| PBO (Probability of Backtest Overfitting) |
Quarterly |
@QGTMAI |
| Deflated Sharpe ratio check |
Quarterly |
@QGTMAI |
| Feature importance drift |
Monthly |
Automated (CI) |
| Out-of-sample performance vs. backtest |
Weekly (live) |
Watchdog daemon |
Model Change Protocol
- Document hypothesis and expected impact
- Run full walk-forward backtest on historical data
- Compare PBO, deflated Sharpe, max drawdown vs. baseline
- PR with backtest results attached
- Two-person review for strategy changes
- Canary period: paper trading for 2 weeks minimum
6. Incident Response
Kill Switch Tiers
| Tier |
Trigger |
Action |
Recovery |
| T1 — Soft |
Single strategy drawdown > threshold |
Halt that strategy, flatten positions |
Auto-resume after cooldown |
| T2 — Hard |
Portfolio drawdown > daily limit |
Halt all strategies, flatten all |
Manual re-enable required |
| T3 — Emergency |
Broker connectivity loss / data corruption |
Halt daemon, cancel all open orders |
Manual restart after investigation |
| T4 — Total |
Security breach / unauthorized access |
Kill all processes, rotate all keys |
Postmortem required before restart |
Escalation Path
Alert fires (Prometheus/watchdog)
--> PagerDuty / Discord alert
--> @QGTMAI acknowledges (< 15 min SLA)
--> Assess tier
--> Execute response per tier table
--> Postmortem within 48 hours
Postmortem Process
- Timeline: Reconstruct events from audit log (correlation IDs)
- Root cause: 5-Whys analysis
- Impact: P&L impact, duration, affected strategies
- Remediation: Concrete action items with owners and deadlines
- Prevention: Systemic fixes, not just patches
- Document: Stored in
docs/postmortems/YYYY-MM-DD-title.md
7. Audit Trail
Merkle-Chained Log
All trading decisions are recorded in an append-only, Merkle-chained audit log (qgtm_core/audit_log.py).
| Property |
Implementation |
| Integrity |
SHA-256 hash chain; each entry includes previous hash |
| Correlation IDs |
Every request gets a UUID propagated through all subsystems |
| Tamper detection |
Hash chain verification on startup and hourly |
| Fields per entry |
timestamp, correlation_id, event_type, strategy, symbol, signal, action, quantity, price, rationale |
Retention Policy
| Data Type |
Retention |
Storage |
| Audit log entries |
7 years |
PostgreSQL + monthly Parquet export to S3 |
| Trade records |
7 years |
PostgreSQL |
| Backtest results |
3 years |
Local Parquet files |
| Application logs |
90 days |
Structured JSON (stdout), rotated |
| Metrics (Prometheus) |
30 days |
Prometheus TSDB |
Querying
# Find all actions for a specific correlation ID
grep "corr_id=abc-123" audit.log
# Verify hash chain integrity
python -m qgtm_core.audit_log --verify
8. Quarterly Review
Cadence: First week of January, April, July, October.
Standing Agenda
- Architecture review -- Are there new single points of failure?
- Model performance -- PBO, deflated Sharpe, OOS vs backtest drift
- Risk parameter review -- Position limits, drawdown thresholds, correlation assumptions
- Access audit -- Review all service accounts, API keys, SSH keys; revoke unused
- Dependency audit -- Review SBOM for CVEs; update pinned versions
- Incident review -- All postmortems since last quarter; systemic patterns
- Backlog triage -- Prioritize open issues, tech debt, gate progress
- Regulatory check -- Wash-sale compliance, PDT status, position limit changes
Deliverable
docs/quarterly/YYYY-QN-review.md committed to repo with findings and action items.
9. Key-Person Risk
Current State
| Domain |
Primary |
Backup |
Cross-Training Status |
| Strategy development |
@QGTMAI |
-- |
Document all strategies in docs/strategies/ |
| Infrastructure / DevOps |
@QGTMAI |
-- |
Runbooks in docs/runbooks/ |
| Broker integration |
@QGTMAI |
-- |
docs/runbooks/broker-setup.md |
| Risk management |
@QGTMAI |
-- |
Model inventory + parameter docs |
| Frontend (Next.js) |
@QGTMAI |
-- |
qgtm_web/README.md |
Mitigation Plan
- Documentation-first: Every system has a runbook; every strategy has a design doc
- Automated operations: CI/CD handles build, test, deploy; watchdog handles live monitoring
- Kill switch independence: T2+ kill switch can be triggered by watchdog without human intervention
- Secrets recovery: All secrets backed up in encrypted vault with documented recovery procedure
- Succession plan: If scaling team, onboarding checklist in
docs/onboarding.md
10. Regulatory Compliance
Wash-Sale Rule (IRC 1091)
| Control |
Implementation |
| 30-day window tracking |
qgtm_risk/wash_sale.py checks before order submission |
| Substantially identical security detection |
Symbol + correlated ETF lookup |
| Cost basis adjustment |
Logged for tax reporting |
| Override |
Requires manual flag + rationale in audit log |
Pattern Day Trader (PDT) Monitoring
| Control |
Implementation |
| Round-trip counter |
Rolling 5-business-day window |
| Threshold alert |
Warning at 3 round trips; block at 4 |
| Account equity check |
Verify > $25,000 before allowing 4th round trip |
| Override |
Disabled in production; available in paper mode only |
Position Limits
| Control |
Implementation |
| Per-symbol max |
Configurable in qgtm_core/config.py; enforced pre-trade |
| Portfolio concentration |
No single position > 20% of NAV (default) |
| Sector concentration |
No sector > 40% of NAV (default) |
| Leverage cap |
Max 1.0x (no leverage by default); configurable per regime |
Reporting
- Tax lots: Tracked per trade for Schedule D / Form 8949
- Wash-sale adjustments: Flagged and logged for tax preparer
- 1099-B reconciliation: Quarterly reconciliation against broker statements
Document History
| Date |
Change |
Author |
| 2026-04-12 |
Initial creation |
@QGTMAI |