Skip to content

Governance — QGTM Trading Platform

Owner: QGTM AI Last reviewed: 2026-04-12 Next review: 2026-07-12 (quarterly) Classification: INTERNAL — share with auditors under NDA


1. Access Control

Resource Who How Provisioned MFA Required
GitHub repo (QGTMAI/trading) @QGTMAI (owner) GitHub RBAC Yes
Production server (DigitalOcean) @QGTMAI SSH key + Tailscale Yes
Alpaca broker (paper) @QGTMAI API key in Vault N/A
Alpaca broker (live) @QGTMAI API key in Vault, IP-restricted N/A
Cloudflare (qgtmai.com) @QGTMAI SSO Yes
Discord / Telegram bots @QGTMAI Bot tokens in Vault N/A
.env / secrets Never committed .gitignore enforced, pre-commit hook N/A

Principle of least privilege: No service account has broader permissions than needed. Broker keys are scoped to trading only (no withdrawal authority).


2. Change Management

Branch Protection (main)

  • Direct push: blocked
  • Required reviewers: 1 (general), 2 (risk-critical paths)
  • Status checks required: lint, mypy, test, coverage >= 70%, sbom
  • Force push: blocked
  • Merge method: squash-and-merge only

CODEOWNERS (enforced paths)

See CODEOWNERS. Two-person review paths:

Path Reason
qgtm_risk/ Kill switch, position limits, circuit breakers
qgtm_execution/ Order routing, broker integration
qgtm_live/ Live daemon, watchdog, reconciliation
qgtm_backtest/ Changes affect all reported metrics
infra/ Deployment, secrets, infrastructure
.github/ CI/CD pipeline definitions

PR Process

  1. Branch from main, prefix: feat/, fix/, refactor/, docs/
  2. Write failing test first (if applicable)
  3. Open PR with description, link to issue if any
  4. CI must pass: lint + mypy strict + full test suite
  5. CODEOWNERS auto-assigned for review
  6. Squash merge after approval(s)
  7. Delete source branch

3. Release Process

Tag (vX.Y.Z)
  --> CI: lint + mypy + 1700+ tests
    --> SBOM generated (CycloneDX)
      --> Container image built + signed (cosign)
        --> Canary deploy (1 replica, 10% traffic)
          --> Smoke tests pass
            --> Full rollout
Step Tool Artifact
Tagging git tag -s vX.Y.Z Signed tag
SBOM syft + CycloneDX sbom.json in release assets
Image signing cosign Signature in OCI registry
Canary K8s Deployment (1 replica) Prometheus alert on error rate
Full rollout K8s rolling update Zero-downtime, readiness probes

Rollback

kubectl rollout undo deployment/qgtm-api

Automatic rollback triggers: error rate > 5% or p99 latency > 200ms during canary window.


4. Two-Person Rule

These actions require approval from two authorized individuals:

Action Method Evidence
Merge to risk-critical paths GitHub CODEOWNERS (2 reviewers) PR approval log
Production deploy Tag + CI + manual approval gate GitHub Actions log
Broker API key rotation Owner + documented witness Vault audit log
Kill switch override Owner + manual confirmation Audit log entry
Position limit increase Owner + risk review documented Git commit + PR
Database migration (production) Owner + reviewed migration script PR + deploy log

5. Model Risk Management (SR 11-7)

Model Inventory

See model_inventory.md for the full register of 18 models.

Category Count Examples
Signal generation 10 TSMOM, XSMOM, ML ensemble, regime detector
Portfolio allocation 3 HRP, risk parity, regime-adaptive
Risk estimation 3 Factor model, EVT tail, options Greeks
Execution 2 VWAP, TWAP

Validation Cadence

Activity Frequency Owner
Backtest re-run (walk-forward) Monthly @QGTMAI
PBO (Probability of Backtest Overfitting) Quarterly @QGTMAI
Deflated Sharpe ratio check Quarterly @QGTMAI
Feature importance drift Monthly Automated (CI)
Out-of-sample performance vs. backtest Weekly (live) Watchdog daemon

Model Change Protocol

  1. Document hypothesis and expected impact
  2. Run full walk-forward backtest on historical data
  3. Compare PBO, deflated Sharpe, max drawdown vs. baseline
  4. PR with backtest results attached
  5. Two-person review for strategy changes
  6. Canary period: paper trading for 2 weeks minimum

6. Incident Response

Kill Switch Tiers

Tier Trigger Action Recovery
T1 — Soft Single strategy drawdown > threshold Halt that strategy, flatten positions Auto-resume after cooldown
T2 — Hard Portfolio drawdown > daily limit Halt all strategies, flatten all Manual re-enable required
T3 — Emergency Broker connectivity loss / data corruption Halt daemon, cancel all open orders Manual restart after investigation
T4 — Total Security breach / unauthorized access Kill all processes, rotate all keys Postmortem required before restart

Escalation Path

Alert fires (Prometheus/watchdog)
  --> PagerDuty / Discord alert
    --> @QGTMAI acknowledges (< 15 min SLA)
      --> Assess tier
        --> Execute response per tier table
          --> Postmortem within 48 hours

Postmortem Process

  1. Timeline: Reconstruct events from audit log (correlation IDs)
  2. Root cause: 5-Whys analysis
  3. Impact: P&L impact, duration, affected strategies
  4. Remediation: Concrete action items with owners and deadlines
  5. Prevention: Systemic fixes, not just patches
  6. Document: Stored in docs/postmortems/YYYY-MM-DD-title.md

7. Audit Trail

Merkle-Chained Log

All trading decisions are recorded in an append-only, Merkle-chained audit log (qgtm_core/audit_log.py).

Property Implementation
Integrity SHA-256 hash chain; each entry includes previous hash
Correlation IDs Every request gets a UUID propagated through all subsystems
Tamper detection Hash chain verification on startup and hourly
Fields per entry timestamp, correlation_id, event_type, strategy, symbol, signal, action, quantity, price, rationale

Retention Policy

Data Type Retention Storage
Audit log entries 7 years PostgreSQL + monthly Parquet export to S3
Trade records 7 years PostgreSQL
Backtest results 3 years Local Parquet files
Application logs 90 days Structured JSON (stdout), rotated
Metrics (Prometheus) 30 days Prometheus TSDB

Querying

# Find all actions for a specific correlation ID
grep "corr_id=abc-123" audit.log

# Verify hash chain integrity
python -m qgtm_core.audit_log --verify

8. Quarterly Review

Cadence: First week of January, April, July, October.

Standing Agenda

  1. Architecture review -- Are there new single points of failure?
  2. Model performance -- PBO, deflated Sharpe, OOS vs backtest drift
  3. Risk parameter review -- Position limits, drawdown thresholds, correlation assumptions
  4. Access audit -- Review all service accounts, API keys, SSH keys; revoke unused
  5. Dependency audit -- Review SBOM for CVEs; update pinned versions
  6. Incident review -- All postmortems since last quarter; systemic patterns
  7. Backlog triage -- Prioritize open issues, tech debt, gate progress
  8. Regulatory check -- Wash-sale compliance, PDT status, position limit changes

Deliverable

docs/quarterly/YYYY-QN-review.md committed to repo with findings and action items.


9. Key-Person Risk

Current State

Domain Primary Backup Cross-Training Status
Strategy development @QGTMAI -- Document all strategies in docs/strategies/
Infrastructure / DevOps @QGTMAI -- Runbooks in docs/runbooks/
Broker integration @QGTMAI -- docs/runbooks/broker-setup.md
Risk management @QGTMAI -- Model inventory + parameter docs
Frontend (Next.js) @QGTMAI -- qgtm_web/README.md

Mitigation Plan

  1. Documentation-first: Every system has a runbook; every strategy has a design doc
  2. Automated operations: CI/CD handles build, test, deploy; watchdog handles live monitoring
  3. Kill switch independence: T2+ kill switch can be triggered by watchdog without human intervention
  4. Secrets recovery: All secrets backed up in encrypted vault with documented recovery procedure
  5. Succession plan: If scaling team, onboarding checklist in docs/onboarding.md

10. Regulatory Compliance

Wash-Sale Rule (IRC 1091)

Control Implementation
30-day window tracking qgtm_risk/wash_sale.py checks before order submission
Substantially identical security detection Symbol + correlated ETF lookup
Cost basis adjustment Logged for tax reporting
Override Requires manual flag + rationale in audit log

Pattern Day Trader (PDT) Monitoring

Control Implementation
Round-trip counter Rolling 5-business-day window
Threshold alert Warning at 3 round trips; block at 4
Account equity check Verify > $25,000 before allowing 4th round trip
Override Disabled in production; available in paper mode only

Position Limits

Control Implementation
Per-symbol max Configurable in qgtm_core/config.py; enforced pre-trade
Portfolio concentration No single position > 20% of NAV (default)
Sector concentration No sector > 40% of NAV (default)
Leverage cap Max 1.0x (no leverage by default); configurable per regime

Reporting

  • Tax lots: Tracked per trade for Schedule D / Form 8949
  • Wash-sale adjustments: Flagged and logged for tax preparer
  • 1099-B reconciliation: Quarterly reconciliation against broker statements

Document History

Date Change Author
2026-04-12 Initial creation @QGTMAI