ADR-002: Core Engine Stack Selection
Status
Accepted
Context
We need to select core libraries for backtesting, data storage, portfolio optimization, and ML research. The choice must balance: - Correctness (no lookahead bias, proper event ordering) - Performance (handle full universe with tick data) - Extensibility (custom strategies, features, risk models) - Commodity ETF suitability (term structure, roll modeling, fundamental data)
Decision
- Backtest/Research: vectorbt (vectorized fast prototyping) + custom event-driven harness for production strategies
- Data Storage: ArcticDB (tick/bar/feature panels), DuckDB (ad-hoc research queries), Polars (in-memory transforms)
- Portfolio/Risk: riskfolio-lib (HRP, risk parity, CVaR optimization), PyPortfolioOpt (mean-variance baseline)
- ML Research: scikit-learn + LightGBM/XGBoost (tree models), PyTorch (deep learning), custom purged CV from mlfinlab patterns
- Feature Store: ArcticDB-backed with Polars transforms
- UI Inspiration: OpenBB Terminal aesthetics, TradingView charting patterns
Consequences
- vectorbt gives us fast iteration for research but lacks event-driven fidelity — production strategies use a custom harness
- ArcticDB is battle-tested at Man Group scale but has a learning curve
- Not using nautilus_trader as core engine (too opinionated for ETF-only, Rust complexity overhead) — VENDOR for execution patterns only
Alternatives Considered
- nautilus_trader as core: Best-in-class event engine but Rust/Cython complexity overhead doesn't justify for ETF-only (no HFT latency needs)
- QuantConnect Lean: C# ecosystem, poor Python interop for custom ML
- qlib (Microsoft): Excellent ML pipeline but weak on execution and commodity-specific features