DESIGN_PRINCIPLES.md
15 invariants for the QGTM trading platform. Each principle is short, rejectable, and CI-enforceable where noted. If a principle is violated, the violating change must not merge.
Money
P1. Decimal everywhere, float nowhere.
All monetary values use Decimal. The float type is forbidden for prices, quantities, PnL, and equity. API responses serialize Decimal as JSON strings.
CI: lint for float( adjacent to monetary field names in qgtm_api/, qgtm_execution/, qgtm_risk/.
P2. One type, one definition.
Money, Symbol, StrategyId, UTCDatetime, and AsOf[T] are defined in qgtm_core/types.py and nowhere else. No downstream module may redefine or alias these types.
CI: grep for NewType("Money" or class Money outside qgtm_core/types.py.
Data
P3. Every join is point-in-time.
All data joins in feature construction must go through pit_join(). Direct join() or join_asof() outside qgtm_data/pit.py is a defect.
CI: lint for .join( and .join_asof( outside pit.py in qgtm_data/ and qgtm_features/.
P4. Knowledge time, not observation time.
When joining external data (FRED, COT, EIA), the join key is the date the data became publicly available, not the date it describes. Publication lag constants live in qgtm_core/constants.py.
P5. Data ingestion must not block trading. Data fetchers run in a separate failure domain from the trading loop. A failed FRED call must never delay order submission. Timeout: 30s per provider, then skip.
Risk
P6. Kill-switch state survives restart.
Kill-switch tier is persisted to Redis. Process restart restores the last known tier. A FLATTEN that happened 10 minutes ago is still a FLATTEN after reboot.
P7. Kill-switch never de-escalates automatically.
Escalation is automatic. De-escalation requires a named human approver via reset_kill_switch(approver=...). The approver's identity is recorded in the audit log.
P8. The watchdog is an independent witness. The dead-man's switch runs in a separate process (or container) from the trading daemon. It shares no memory, no event loop, and no failure mode with the daemon. If the daemon dies, the watchdog lives and flattens.
Orders
P9. Every order is written before it is sent.
The OMS writes a WAL entry to Redis before calling broker.submit_order(). On crash, the WAL is replayed and reconciled with broker state. No order exists only in memory.
P10. Broker is source of truth for positions. The OMS reconciles with broker state every cycle. When local state and broker state disagree, broker wins. Discrepancies are logged, audited, and alerted.
Observability
P11. Two IDs on every event.
Every log entry, audit record, OTel span, and Redis Stream event carries both trace_id (W3C 32-hex) and correlation_id (UUID4 hex). An event without both IDs is a bug.
CI: test that audit entries created within bind_correlation() have non-empty correlation_id.
P12. Audit chain is immutable and verifiable.
The audit log is append-only with Merkle chain integrity (SHA-256). verify_chain() must return True at all times. A broken chain is a severity-1 incident.
Architecture
P13. Dependencies flow one way.
qgtm_core has zero internal imports. Each package depends only on packages closer to core. No cycles. No upward imports.
CI: import-linter or equivalent checks the dependency DAG on every PR.
P14. One process, one job. Each deployment unit (container, systemd service) does exactly one thing: trade, watch, serve API, ingest data, or write audit. No process does two jobs.
Governance
P15. Every principle is tested quarterly. Once per quarter, open a GitHub issue from the quarterly review template. Walk through each principle. For each: verify it holds, document any exceptions, and decide if the principle should change. If a principle no longer serves the system, delete it.
How to Use This Document
- Before writing code: Check that your change does not violate any principle.
- During code review: Reject PRs that violate a principle unless the PR also updates this document with a justified exception.
- During quarterly review: Walk through all 15 principles. Update, add, or remove as needed.
- When onboarding: Read this document on Day 1. These are the rules of the system.
Exception Process
If a principle must be violated:
- Document the violation in the PR description
- Explain why the principle cannot be followed
- Get approval from the module owner
- Add a
# PRINCIPLE_EXCEPTION: P{N} -- {reason}comment at the violation site - Track the exception in the quarterly review
No silent violations. No permanent exceptions without quarterly re-approval.