Advances in Financial Machine Learning
Authors: Marcos López de Prado | Year: 2018 | Journal: Book (Wiley)
Thesis
Traditional quantitative finance methods (regression, time-series analysis) are poorly suited to the high-dimensional, noisy, non-stationary nature of financial data. AFML presents a systematic framework for applying machine learning to finance while avoiding the catastrophic overfitting that plagues naive ML applications. Key contributions: (1) Triple Barrier Method for labeling -- replacing fixed-horizon returns with dynamic profit-taking / stop-loss / time-expiry barriers. (2) Fractional differentiation -- making price series stationary while preserving memory. (3) Meta-labeling -- using a secondary ML model to size positions based on the primary model's signal. (4) Purged cross-validation -- preventing information leakage in time-series train/test splits. (5) Feature importance via MDA/MDI -- understanding what drives ML predictions.
Key Math
Fractional differentiation of a price series \(\{X_t\}\) to achieve stationarity while retaining memory:
where \(d \in [0, 1]\) is the fractional order of differentiation. For gold prices, \(d \approx 0.35\text{-}0.45\) typically achieves stationarity (ADF test \(p < 0.05\)) while retaining >90% of the information (\(\text{corr}(\tilde{X}_t, X_t) > 0.90\)).
Triple Barrier labeling for observation \(i\):
Upper/lower barriers set at \(\pm \tau \cdot \sigma_{i,t}\) (volatility-scaled), vertical barrier at \(T\) bars.
Purged K-fold cross-validation: For each fold, purge all training observations whose labels overlap in time with any test observation, plus an embargo period of \(h\) bars after each test set.
Data & Method
- The book uses various datasets; the methods are framework-level and asset-agnostic.
- For our gold/silver application: tick data from COMEX (GC, SI), 1-minute to daily bars.
- Feature engineering: fractionally differenced prices, volume clock bars (volume bars, dollar bars, tick bars), microstructural features (VPIN, Kyle's lambda, Amihud illiquidity).
- Models: Random forests (primary), gradient-boosted trees, with meta-labeling via logistic regression.
- Evaluation: Purged walk-forward cross-validation, combinatorial purged cross-validation (CPCV).
Our Replication Verdict
CONFIRMED -- The AFML framework is indispensable for our ML pipeline. Specific results for gold/silver: (1) Fractional differentiation at \(d \approx 0.4\) for gold and \(d \approx 0.35\) for silver passes ADF while preserving predictive information. Standard first-differencing (\(d=1\)) destroys too much signal. (2) Triple barrier labeling dramatically improves prediction vs. fixed-horizon labeling (F1 score improvement of 15-25% in our tests). (3) Purged CV is essential -- standard K-fold overstates accuracy by 5-15% for gold models due to serial correlation in precious metals. (4) Meta-labeling works well: a primary trend model generates direction, a secondary random forest sizes the bet based on confidence features (volatility, volume, COT positioning, time-of-day). (5) Known weakness: The feature importance methods (MDI/MDA) can be unstable for highly correlated features, which precious metals macro features often are (real rates, USD, breakeven inflation are correlated).
Signal Mapping
- ML signal generation framework (SS5.4) -- the entire ML pipeline follows AFML methodology.
- Fractional differentiation applied to all price-based features before model ingestion.
- Triple barrier method with vol-scaled barriers (tau = 2.0 for gold, 2.5 for silver) and 5-day vertical barrier.
- Purged walk-forward with 5-day embargo used for all model evaluation and hyperparameter tuning.
- Meta-labeling model runs on top of the primary trend signal to produce position sizes.
- Dollar bars used as the primary sampling method for intraday models (roughly 1000 bars/day for GC).
References
- López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley. ISBN: 978-1-119-48208-6.
- López de Prado, M. (2020). Machine Learning for Asset Managers. Cambridge University Press.
- Hosking, J.R.M. (1981). "Fractional Differencing." Biometrika, 68(1), 165-176.
- Bailey, D.H. et al. (2014). "Pseudo-Mathematics and Financial Charlatanism." Notices of the AMS, 61(5), 458-471.