Skip to content

ADR-002: Core Engine Stack Selection

Status

Accepted

Context

We need to select core libraries for backtesting, data storage, portfolio optimization, and ML research. The choice must balance: - Correctness (no lookahead bias, proper event ordering) - Performance (handle full universe with tick data) - Extensibility (custom strategies, features, risk models) - Commodity ETF suitability (term structure, roll modeling, fundamental data)

Decision

  • Backtest/Research: vectorbt (vectorized fast prototyping) + custom event-driven harness for production strategies
  • Data Storage: ArcticDB (tick/bar/feature panels), DuckDB (ad-hoc research queries), Polars (in-memory transforms)
  • Portfolio/Risk: riskfolio-lib (HRP, risk parity, CVaR optimization), PyPortfolioOpt (mean-variance baseline)
  • ML Research: scikit-learn + LightGBM/XGBoost (tree models), PyTorch (deep learning), custom purged CV from mlfinlab patterns
  • Feature Store: ArcticDB-backed with Polars transforms
  • UI Inspiration: OpenBB Terminal aesthetics, TradingView charting patterns

Consequences

  • vectorbt gives us fast iteration for research but lacks event-driven fidelity — production strategies use a custom harness
  • ArcticDB is battle-tested at Man Group scale but has a learning curve
  • Not using nautilus_trader as core engine (too opinionated for ETF-only, Rust complexity overhead) — VENDOR for execution patterns only

Alternatives Considered

  • nautilus_trader as core: Best-in-class event engine but Rust/Cython complexity overhead doesn't justify for ETF-only (no HFT latency needs)
  • QuantConnect Lean: C# ecosystem, poor Python interop for custom ML
  • qlib (Microsoft): Excellent ML pipeline but weak on execution and commodity-specific features

References