Trend / Momentum Strategy Family — Research & Implementation Spec
Author: Quant research (institutional)
Scope: Time-series momentum (managed-futures / trend) + cross-sectional momentum for the QGTM Alpaca ETF book.
Status: RESEARCH + SPEC ONLY. No production code written, no repo edits, no git. Implementation is left to a follow-up engineering task.
Target codebase: /Users/admin/qgtmai/trading (Python 3.12). Strategy contract in qgtm_strategies/base.py; validation in qgtm_backtest/validation.py; features in qgtm_features/store.py.
0. Executive summary & prioritized shortlist
Trend/momentum is the single best diversifier the book can own: ~century-long, multi-asset evidence, near-zero correlation to everything else in QGTM (which is overwhelmingly gold/PM mean-reversion, carry, vol, and positioning), and reliable crisis alpha (positive in 8 of the 10 worst 60/40 drawdowns since 1880; +14% in 2008, ~+25% in 2022 while the S&P fell). The catch is honest and specific: as a standalone, ETF-only, post-2010 sleeve its realistic net Sharpe is ~0.5–0.8, not the headline 1.1+ from the diversified-futures literature. Its job in this book is risk reduction and tail hedging, not a high standalone Sharpe.
Three of the book's existing trend strategies (carver_trend −1.63 Sharpe / −92% return, gold_tsmom −0.66, gold_multi_ma_trend −0.72) are already quarantined (qgtm_core/constants.py::QUARANTINED_STRATEGIES). Section 2.4 diagnoses why they failed — it is mostly (a) single-asset (no diversification multiplier), (b) long-only (no short/crisis leg), and (c) a backtest cost model that charges fees on gross weight every day instead of on turnover. The design below fixes all three.
Prioritized implementation shortlist (best risk-adjusted, lowest-correlation contribution first):
| Rank | Strategy | What it is | Why first | Est. net Sharpe (standalone, ETF, realistic) | Corr. to existing book |
|---|---|---|---|---|---|
| 1 | tsmom_multi (§5.1) |
Diversified multi-horizon (1m/3m/12m) vol-targeted time-series momentum, long/short, monthly + buffered | Highest, most robust edge; crisis alpha; lowest correlation; subsumes the 3 quarantined single-asset trend names | 0.55–0.80 | ~0.0–0.2 |
| 2 | trend_portfolio combiner (§6) |
Portfolio layer: IDM, portfolio-vol target, correlation netting, turnover buffer | Makes #1 deployable & cost-safe; without it #1 repeats carver_trend's cost death |
n/a (wraps #1) | n/a |
| 3 | xsmom_multiasset (§5.3) |
Cross-sectional momentum: rank the universe, long winners / short losers, crash-scaled | Real diversifier vs TSMOM when dispersion is high; complements #1 | 0.30–0.55 (thinner; honest) | ~0.1–0.4 |
| 4 | ewmac_trend_v2 (§5.2) |
Refined Carver EWMAC (the fixed carver_trend) |
~0.85 correlated to #1 → fold in as a robustness ensemble member, not separate capital | 0.50–0.70 (≈#1) | ~0.0–0.2 |
| 5 | breakout_donchian (§5.4) |
Donchian/Turtle channel breakout + ATR stop | Alt entry rule, ~0.75 corr to #1; lowest priority; useful for discrete entry/exit & intraday | 0.40–0.60 | ~0.0–0.2 |
Recommendation: build #1 + #2 first as one coherent sleeve, with #4 and #5 available as optional signal-ensemble members inside #1 (they raise stability, not independent alpha — all price-trend rules are 0.8–0.95 correlated). Add #3 only after #1 is validated, and size it small.
1. Evidence base (grounding in published institutional research)
1.1 Time-series momentum (the flagship effect)
-
Moskowitz, Ooi, Pedersen (2012), "Time Series Momentum," Journal of Financial Economics 104(2):228–250. The seminal result: across 58 liquid futures (equity index, currency, commodity, bond), a security's own past 12-month excess return positively predicts its next-month return for every one of the 58 instruments. Persistence runs 1–12 months and partially reverses beyond ~12 months (consistent with under-reaction then delayed over-reaction). The diversified, vol-scaled TSMOM portfolio has an annualized Sharpe ≈ 1.1 (gross), ~2.5× the equity market, with little exposure to standard factors, and it performs best in extreme up/down markets (the "TSMOM smile" / left-tail hedge). Speculators earn TSMOM profits from hedgers. Design implications used below: 12-month is the canonical horizon; sign of past return sets direction; scale each position inversely to ex-ante volatility to equalize risk; the payoff convexity is the crisis-alpha property worth paying for.
-
Hurst, Ooi, Pedersen (2017), "A Century of Evidence on Trend-Following Investing," Journal of Portfolio Management (AQR). Out-of-sample extension to 67 markets, 1880–2016. They build an equal-weighted combination of 1-, 3-, and 12-month TSMOM, rebalanced monthly, vol-scaled (rolling 3-year covariance). Findings: positive average return in every market, individual-market average gross Sharpe ≈ 0.4; the aggregate diversified strategy realized net-of-fee Sharpe ≈ 0.77 (1880–2013); positive in every decade since 1880; positive in 8 of the 10 largest 60/40 drawdowns. Even assuming a future Sharpe of only 0.4, a 20% allocation to a 60/40 portfolio raised its Sharpe from 0.38 → 0.46 and cut drawdowns. Design implications: (i) the multi-horizon (1m/3m/12m) blend is the institutional standard, not 12m alone; (ii) per-asset Sharpe is only ~0.4 — the portfolio Sharpe comes from diversification across many uncorrelated markets (this is exactly what the quarantined single-asset gold trend strategies lacked); (iii) honest forward expectation is closer to 0.4–0.8 net than to 1.1.
-
Baltas & Kosowski (2013/2020), "Demystifying Time-Series Momentum Strategies" & "Improving Time-Series Momentum Strategies" (CME / SSRN). Turnover is dominated by two choices: the volatility estimator and the trading (signal) rule. Using the Yang–Zhang (2000) range-based vol estimator cuts turnover ~17% with no significant performance loss; replacing the binary sign rule with a continuous "TREND" rule cuts another ~24%; a linear-trend-fit (regression-slope t-stat) signal reduces turnover by ~two-thirds and dominates other signals out-of-sample. Traditional TSMOM spends ~10% of gross return on trading costs over 30 years; these refinements cut that by ~35%. Design implications: this is the single most actionable cost guidance for our (cost-sensitive) engine — use a range-based vol estimate (OHLC are available in our bars), a continuous capped signal (not binary
sign), and prefer the regression-slope rule when turnover matters.
1.2 Cross-sectional momentum (the complementary effect)
-
Asness, Moskowitz, Pedersen (2013), "Value and Momentum Everywhere," Journal of Finance 68(3):929–985. Consistent cross-sectional momentum premia across 8 markets/asset classes, with a strong common factor structure; value and momentum are negatively correlated (avg ≈ −0.60), so a 50/50 value+momentum combination reaches a global Sharpe ≈ 1.42. Design implications: cross-sectional (relative-strength) momentum is a different bet from time-series momentum and from value; it earns its keep when dispersion across assets is high. On a small ETF cross-section it is noisier than on single stocks — sized as a satellite.
-
George & Hwang (2004), "The 52-Week High and Momentum Investing," Journal of Finance. Nearness to the 52-week high predicts future returns and dominates past-return momentum; importantly these returns do not reverse in the long run (anchoring / under-reaction story). Motivates the breakout/channel variant (§5.4).
1.3 Risk management of momentum (why vol-scaling is non-negotiable)
- Daniel & Moskowitz (2016), "Momentum Crashes," JFE 122(2):221–247. Momentum returns are negatively skewed with infrequent, severe, persistent crashes, forecastable: they occur in "panic" states (after market declines, when market volatility is high) and are contemporaneous with market rebounds, driven by the short/loser leg snapping back (1932: losers +232% vs winners +32%; Mar–May 2009: losers +163% vs winners +8%). A dynamic strategy that scales by forecast mean/variance ≈ doubles the Sharpe and alpha.
- Barroso & Santa-Clara (2015), "Momentum Has Its Moments," JFE. Scaling momentum by its own realized variance virtually eliminates the crashes and roughly doubles the Sharpe.
- Harvey, Hoyle, Korgaonkar, Rattray, Sargaison, van Hemert (2018), "The Impact of Volatility Targeting," JPM. Across 60+ assets since 1926: vol targeting raises Sharpe for risk assets (US equities 0.40 → 0.48–0.51, via the leverage-effect "momentum" it induces), is Sharpe-neutral for bonds/FX/commodities, but reduces the likelihood of extreme returns and max drawdowns across all asset classes. Vol target used: ~10%. Design implications: (i) always vol-target at both asset and portfolio level; (ii) for the long/short momentum sleeve, add crash control (scale the whole sleeve by its trailing realized vol, and de-risk the short leg in panic states); (iii) vol targeting's biggest documented benefit is tail/drawdown reduction, which is precisely what this book needs.
1.4 Regime / decay honesty (where the edge thins)
- The "CTA Winter" (≈Nov 2010–Mar 2014) and "Trade-War" chop (≈Apr 2015–Jan 2019) were prolonged trend droughts: low vol, range-bound, central-bank-suppressed markets produced few sustained trends. AQR (2018, "It's Not the Process…") attributes this to environment, not a broken effect.
- 2022 was one of the best CTA years since 2000 (SG Trend ≈ +25–27% vs S&P −18%) on long energy / short bonds / long USD — classic crisis alpha.
- Crisis alpha is NOT guaranteed: the COVID crash (Feb–Mar 2020) was too fast and V-shaped for trend to capture; trend was roughly flat-to-negative into the bottom and recovered in H2-2020. 2025 "Liberation Day" produced one of the deepest trend drawdowns since 2000. Design implications: expect multi-year flat/negative stretches; size for survivability; explicitly stress-test 2008 (should win), 2011–2013 (should bleed gently), 2020 (should be flat/negative), 2022 (should win).
2. Codebase integration map
2.1 The strategy contract (qgtm_strategies/base.py)
Every strategy subclasses Strategy and implements:
def generate_signals(self, features: pl.DataFrame, universe: list[str], timestamp: datetime) -> list[Signal]: ...
featuresis PIT-safe long-format (one row per(symbol, timestamp)), guaranteed to contain only data known at/beforetimestamp. Do not look forward.- Return
list[Signal](qgtm_core/types.py):weight ∈ [-1, 1],confidence ∈ [0, 1],side ∈ {LONG, SHORT}, plusrationale,metadata. UseSide.LONG/SHORTconsistent withsign(weight). - Always finish with
return self.validate_signals(signals)(rejects NaN/inf, out-of-range, blank symbol). Useself._empty("reason")on every silent return path so the daemon can surface why a strategy was quiet. - Declare for attribution/risk:
strategy_id,asset_classes,regime_tags,factor_exposures(usemomentum=…, plus per-asset-class loadings),capacity(CapacityModel),rebalance_frequency. macro_series(features, col)dedupes a macro column to a clean time series.
2.2 Features already wired (qgtm_features/store.py::compute_price_features)
Long-format per symbol; all PIT-safe. Directly usable:
- Returns:
ret_{1,5,10,21,63,126,252}d=close/close.shift(h) − 1. - Annualized realized vol:
vol_{21,63,126}d=pct_change().rolling_std(h) * sqrt(252). - Sign/trend signals:
tsmom_signal_{21,63,126,252}d=sign(ret_hd). - Risk-adjusted momentum (key):
vol_adj_mom_{21,63,252}d=ret_{h}d / vol_{min(h,126)}d. This is already a continuous, vol-normalized TSMOM signal — exactly the Baltas–Kosowski "TREND" input, no new feature needed for a first version. - Mean-reversion / range:
zscore_21d,zscore_63d,bb_position_21d,rsi_14,hl_range_pct,avg_hl_range_21d. - Skew / carry / positioning:
skew_21d,skew_63d,carry_proxy_{21,63}d,backwardation_indicator,roll_yield_est,net_spec_zscore,cot_*. - GARCH(1,1) vol available via
compute_garch_vol/_add_garch_vol(garch_vol_21d). - Macro (ETF proxies, daily):
compute_macro_features_from_etfsbuildsdxy_mom_21d(UUP),real_rate_proxy(TIP−SHY),vix_proxy_21d(SPY realized vol),inflation_exp_proxy. UUP/TIP/SHY/SPY are already fetched — these power the crash-state gate (§5.3) without new data. - OHLC present in bars → the Yang–Zhang range-based vol estimator (Baltas–Kosowski) is implementable from existing columns; only the estimator function is new.
Reusable trend machinery already in the repo (qgtm_strategies/pysystemtrade_patterns.py): compute_ewmac, compute_ewmac_forecast (with Carver scalars), combine_forecasts, estimate_forecast_diversification_multiplier (FDM), estimate_idm (IDM), vol_target_position_size. Do not rebuild these — ewmac_trend_v2 (§5.2) and the combiner (§6) should import them.
2.3 Backtest engine & validation realities (read this before estimating net Sharpe)
qgtm_backtest/engine.py::BacktestEngine.run:
- Timing is correct (no look-ahead): a signal generated at close of
t−1earnsret_1datt. Good. - ⚠ Cost model charges on gross weight every rebalance, not on turnover. Per day:
net_ret += asset_ret * scaled_weight − cost * abs(scaled_weight), withcost = (commission_bps + slippage_bps)/1e4 = 10 bpsby default, andrun()callsgenerate_signalsevery day (it ignoresrebalance_frequency). So an always-on full-weight book pays ~10 bps × 252 ≈ 25%/yr even if positions never change. This is the dominant reasoncarver_trendprinted −92%. The prompt notes the backtester is being repaired; net numbers in this spec assume the fix = charge cost on turnover|w_t − w_{t−1}|. Until that lands, evaluate trend withcommission_bps=0, slippage_bps=0(gross) and a separately-computed turnover×spread cost, or the sleeve will show false-negative net Sharpe. - Gross is capped:
weight_scale = min(1, 2.0/gross_weight); daily return capped at ±10% (BACKTEST_DAILY_RETURN_CAP). - Live turnover control lives in the daemon, which honors
rebalance_frequencyand per-strategy notional caps (MACRO_STRATEGY_REBALANCE_MAX_NOTIONAL_PCT = 0.35). So buffering must be implemented inside the strategy (stable weights) and respected by cadence.
Validation harness (qgtm_backtest/validation.py::run_full_validation) — the gate every new strategy must clear:
- CPCV/PBO (
combinatorial_purged_cv, 6 groups, 2 test): passPBO < 0.5. - Deflated Sharpe (
deflated_sharpe_ratio): passDSR > 0.95. Honest caveat in §7. - Multiple-testing (Harvey-Liu-Zhu threshold).
- Walk-forward anchored + rolling (
walk_forward, train 504 / test 63 / embargo 5): aggregate OOS Sharpe > 0. - Regime-stratified (
regime_stratified_sharpe): must be positive in ≥2 regimes (single-regime strategies rejected). - Block bootstrap (
monte_carlo_block_bootstrap): 2.5th-pct Sharpe CI lower bound > 0. - Capacity (Almgren–Chriss): > $1M.
Engine drivers: run_walk_forward (train 504/test 63/step 63/embargo 21) and run_purged_kfold. Cost defaults DEFAULT_COMMISSION_BPS=5, DEFAULT_SLIPPAGE_BPS=5, DEFAULT_VOL_TARGET=0.12.
2.4 Why the three existing trend strategies were quarantined (and how this design fixes it)
From QUARANTINED_STRATEGIES (2020–2024 cohort backtest):
| Quarantined | Result | Root cause | Fix in this spec |
|---|---|---|---|
carver_trend |
Sharpe −1.63, return −92% | Daily, always-on, full-gross EWMAC → cost model charges ~25%/yr on gross; no buffering; long/short on contango-bleeding K-1 ETFs (USO/UNG) where roll decay swamps trend | Turnover-based cost (dependency); buffering + monthly cadence; exclude/penalize structurally-decaying single-commodity ETFs; diversified universe |
gold_tsmom |
Sharpe −0.66 | Single asset (gold only) → realizes only the ~0.4 per-asset Sharpe with none of the diversification lift; long-only (no short/crisis leg); 2020–2024 was choppy for gold trend | Multi-asset universe (IDM lifts Sharpe); long & short; multi-horizon blend |
gold_multi_ma_trend |
Sharpe −0.72 | Same single-asset/long-only problems; 5-MA score is high-turnover and binary-ish | Same fixes; continuous capped signal; buffering |
The core lesson the data already taught this book: per-asset trend ≈ 0.4 Sharpe and dies after costs; the Sharpe is manufactured by diversifying across many low-correlation assets and by NOT overtrading. Every design choice below follows from that.
3. Universe definition (trend_universe)
Diversification across uncorrelated macro blocks is the entire source of the Sharpe lift, so the universe must span asset classes — not just commodities. All are liquid, US-listed, Alpaca-tradable (paper & live), and most are already known to the system (LIQUID_ETFS already lists GLD, SLV, IAU, SPY, QQQ, TLT, IWM, EEM, GDX; UUP/TIP/SHY/SPY already fetched as macro proxies).
Tier 1 — core (≈12, build here first; tightest spreads ~0.5–3 bps):
| Block | Tickers | Notes |
|---|---|---|
| Equity | SPY, QQQ, IWM, EFA, EEM |
US large/tech/small + intl dev + EM |
| Rates | TLT, IEF, SHY |
20y+, 7–10y, 1–3y (SHY also = cash proxy for excess returns) |
| Credit | LQD, HYG |
IG + HY (HY adds a risk-on/off trend) |
| Metals | GLD, GDX |
gold + gold miners (GDX is higher-beta trend) |
Tier 2 — extension (add after Tier-1 validates; wider spreads, watch cost/capacity):
| Block | Tickers | Caveats |
|---|---|---|
| Metals | SLV, GDXJ, SIL |
silver/junior miners — higher vol, wider spread |
| Broad cmdty | DBC, DBA |
DBA thin (~$0.6M ADV in canonical universe) |
| Energy | USO, UNG |
K-1 / contango bleed — only trade with explicit roll-aware caveat; UNG especially |
| FX | UUP, FXE, FXY |
UUP liquid; FXE/FXY thinner |
Honest universe caveats:
- Contango decay: USO/UNG (and leveraged UCO/BOIL/UGA) carry structural roll losses; a naive long-trend on them captures the trend but eats roll. Either (a) exclude from Tier-1, (b) use DBC/broad baskets which roll more intelligently, or (c) net the roll via carry_proxy_*. Leveraged/inverse ETFs (LEVERAGED_ETFS, INVERSE_ETFS) are excluded — they decay and break vol-targeting.
- Wrapper netting: PRIMARY_BY_ALIAS already nets GLD/IAU/SGOL and SLV/SIVR — pick one wrapper per underlying (GLD, SLV) to avoid double-counting trend exposure.
- Excess returns: TSMOM is properly defined on excess-of-cash returns. For most ETFs the cash rate is immaterial to the sign, but for the bond leg it matters — subtract the SHY (or BIL) return when computing the rates-block signal, or at minimum note the bias. SHY is already wired.
- One-time wiring: the canonical qgtm_core/universe.py::UNIVERSE is commodity-only. The strategy should declare its own trend_universe list (or extend UNIVERSE); no new data feed is required — Alpaca daily bars already serve all of these symbols and the feature store computes features for any symbol passed in.
4. Shared design components
These are common to all strategies; specify once, reuse.
4.1 Volatility estimation (PIT-safe)
Use an ex-ante estimate (data through t−1 only). Three options, in order of turnover-efficiency:
- Already-wired baseline:
vol_63d(annualized 63-day SD). Simplest; fine for v1. - EWMA blend (Carver standard, recommended):
σ_ewma uses span 32 (≈ 36-day half-life); blend with a slow anchor to avoid whipsaw:
σ_t = 0.7 · σ_EWMA,32(t) + 0.3 · σ_252(t), annualized ×√252. (compute_ewmac_forecastalready builds an EWMA vol internally.) - Yang–Zhang (2000) range-based (turnover-optimal, Baltas–Kosowski −17% turnover): uses O/H/L/C (all present in bars). New helper:
σ²_YZ = σ²_overnight + k·σ²_open-close + (1−k)·σ²_RS, where σ²_RS is Rogers–Satchell. Recommended once v1 works, because it directly lowers cost.
Floor σ at a small ε (e.g., 1%) to avoid divide-by-zero (the existing strategies already do this).
4.2 Trend signal rules (per asset, per horizon) — continuous & capped
Let L be the lookback (21/63/252 d ≈ 1/3/12 m). Compute a continuous trend strength (never the raw binary sign, per Baltas–Kosowski):
- Rule A — risk-adjusted return (use existing feature):
fᴬ_L = clip( vol_adj_mom_Ld / s_L , −2, +2 )wherevol_adj_mom_Ld = ret_Ld / vol_{min(L,126)}dis already computed.s_Lis a per-horizon scalar chosen so the average |f| ≈ 1 (calibrate empirically; ≈ value ofvol_adj_momat which you want full size). - Rule B — EWMAC (use existing
compute_ewmac_forecast): pairs(16,64),(32,128),(64,256)→ forecasts in [−20,+20]; divide by 10 to get [−2,+2]. Slow pairs only (drop (2,8)/(4,16) — they over-trade and were part ofcarver_trend's death). - Rule C — regression-slope t-stat (Baltas–Kosowski turnover-optimal, −⅔ turnover): OLS of
log(close)on time over windowL; signal =clip( β̂ / se(β̂) / s'_L , −2, +2 )(a trend t-stat). This is the recommended rule when cost is binding; it is the smoothest.
All three are 0.8–0.95 correlated to each other — blending them is a robustness/stability play, not independent alpha (be honest about this in attribution).
4.3 Multi-horizon combination
Equal-weight the three horizons (Hurst standard):
F_combined = clip( FDM · mean( f_1m, f_3m, f_12m ), −F_cap, +F_cap ), F_cap = 2 (in [−2,+2] units) or 20 (Carver units).
FDM = forecast diversification multiplier from estimate_forecast_diversification_multiplier (≈1.0–1.5 for 3 horizons). Capping prevents a single runaway horizon from dominating.
4.4 Volatility targeting → position weight (the sizing core)
Per-asset target weight that equalizes risk and respects the codebase [-1,1] convention:
F_i / F_full∈ [−1,1] is the normalized trend strength (full conviction atF_full, e.g. 10 in Carver units or 1.0 in [−2,+2]/2 units).σ_target_asset= per-asset annualized vol budget. With ~10 names and portfolio target 12% (DEFAULT_VOL_TARGET), set per-asset target ≈σ_target_port / sqrt(N_eff) · IDM(the combiner, §6, computes this exactly). A practical single-strategy value:σ_target_asset ≈ 0.10–0.16.σ_i= ex-ante vol (§4.1). So a calm trending name levers up; a wild one is cut — this is the entire MOP/Harvey vol-target mechanism.w_max= single-name cap; respectRiskLimits.max_single_name_pct(0.15) — use 0.10–0.20. The engine's gross cap (2.0) and the daemon'sMACRO_STRATEGY_REBALANCE_MAX_NOTIONAL_PCT(0.35) are the outer guards.- Confidence field:
confidence = min(|F_i|/F_cap, 1).
4.5 Turnover control — buffering / no-trade band (cost survival)
Given the cost reality (§2.3), buffering is mandatory, not optional. Carver's rule:
with buffer ≈ 0.10 · |w_target,i| or an absolute floor (e.g. 0.5% of equity). The strategy must therefore track its own last-emitted weights (e.g., via strategy_state / qgtm_core/strategy_state.py) and emit the held weight when inside the band. Combined with monthly signal cadence and the continuous signal rule, this targets ≈2–4× annual turnover per name (vs the 10–20× that killed carver_trend). Also honor MIN_EXECUTABLE_ORDER_NOTIONAL_USD (don't emit dust).
4.6 No-look-ahead construction (must-pass)
- Compute every signal from
features.filter(timestamp <= t); never referencet's ownret_1d. - Returns/vols use
close.shift(h)(already lagged in the feature store). The regression slope and EWMA recursions use only past closes. - Embargo ≥ lookback for labels: a 252-day momentum signal induces 252-day overlapping "labels"; CPCV/WF embargo must be ≥ the signal horizon to avoid leakage. The default WF embargo (5–21 d) is too small for the 12-month signal — see §7.
- Monthly rebalance is keyed off the calendar in
features, not off a row index, to stay PIT-safe across missing days.
5. Strategy specifications
Each spec follows the mandated structure: rationale & persistence → formulas → data → entry/exit & sizing → expected gross/net Sharpe, turnover, capacity → correlation → failure modes & decay → OOS validation.
5.1 tsmom_multi — Diversified multi-horizon time-series momentum ⭐ FLAGSHIP
Economic rationale & why the edge persists. Trends arise from investor under-reaction to information (anchoring, disposition effect, gradual diffusion of news) followed by delayed over-reaction / herding, plus non-profit-seeking flows (central banks defending levels, corporate hedgers, risk-management-driven de/re-leveraging). MOP show speculators earn TSMOM from hedgers — a structural, not purely behavioral, transfer. It persists because it is a risk transfer and a behavioral bias that is hard to arbitrage away (it requires leverage, shorting, and tolerance for multi-year droughts that fire managers — limits to arbitrage). 130+ years of out-of-sample evidence (Hurst 2017) is the strongest non-data-mined track record in the factor zoo.
Exact signal formulas. For each asset i in trend_universe, at month-end t (PIT: data ≤ t):
1. Horizon signals (Rule A baseline; optionally blend B & C per §4.2):
f_{i,1m} = clip(vol_adj_mom_21d / s₁, −2, 2)
f_{i,3m} = clip(vol_adj_mom_63d / s₃, −2, 2)
f_{i,12m} = clip(vol_adj_mom_252d / s₁₂, −2, 2) (use ret_252d / vol_126d; the 252d return is the canonical 12m trend).
2. Combine: F_i = clip( FDM · (f_{i,1m}+f_{i,3m}+f_{i,12m})/3 , −2, 2 ).
3. Vol-target weight (§4.4): w_i = clip( (F_i/2) · (σ_target/σ_i), −w_max, +w_max ), σ_i from §4.1, σ_target≈0.12, w_max=0.15.
4. Buffer vs last weights (§4.5); set side = LONG if w_i>0 else SHORT, confidence=min(|F_i|/2,1).
Required data + wired? ret_{21,63,252}d, vol_{63,126}d, vol_adj_mom_{21,63,252}d, close. All already computed by compute_price_features. Only new code: the strategy class + buffering state + (optional) Yang–Zhang vol. No new data feed.
Entry/exit & position sizing. Continuous: there is no discrete entry/exit — w_i moves with F_i and is rebalanced monthly inside the buffer band. Direction flips when the blended trend flips sign. Sizing is fully vol-targeted (§4.4). Optional hard stop: flatten a name if its own drawdown since entry exceeds e.g. 3×ATR (reduces left tail; mildly raises turnover).
Expected gross/net Sharpe, turnover, capacity.
- Gross Sharpe: literature 0.7–1.1 diversified-futures; on this ETF-only, ~12-name, post-2010 universe, expect 0.6–0.9 gross.
- Net Sharpe (turnover-cost model, 1–4 bps ETF spreads, monthly + buffer): ≈0.55–0.80. On an adverse short window (e.g., 2015–2019 CTA chop) it can be ~0.2 or negative — that is expected and survivable, not broken.
- Turnover: ≈2–4× notional/yr per name with monthly + buffering.
- Capacity: effectively unbounded at book scale — SPY/QQQ/TLT/GLD ADV is \(1–30bn; even Tier-2 thin names (DBA/UNG) clear >\)10–50M. Capacity is not the binding constraint; cost on thin names is. CapacityModel(max_capacity_usd≈500_000_000).
Correlation to other sleeves. ~0.0–0.2 to the gold/PM mean-reversion, carry, vol, and positioning sleeves; ~0 to a static equity allocation; positive skew / crisis alpha. The only meaningful within-book overlap was the (now-quarantined) single-asset gold trend names, which this strategy subsumes (gold trend is one of ~12 legs here).
Failure modes & known decay.
- Trendless chop / sharp reversals (CTA Winter 2011–2014; whipsaw) → the dominant failure; mitigated by multi-horizon (the 12m leg ignores short chop) + buffering.
- V-shaped crashes (COVID Mar-2020) → too fast to capture; trend gives back the down-move; accept it.
- Crowding/decay: trend is heavily allocated industry-wide; expect lower forward Sharpe than history (already priced into the 0.55–0.80 estimate). Monitor with DecayMetrics (6m/12m vs in-sample; DECAY_THRESHOLD=0.5).
- Contango bleed on energy legs (USO/UNG) → exclude from Tier-1 or roll-adjust.
- Cost model (§2.3) → false-negative net unless turnover-costing is used.
OOS validation plan.
- run_walk_forward anchored + rolling, but override windows for the 12m signal: train ≥ 756 (3y), test 126 (6m), step 126, embargo ≥ 252 (= the longest lookback, to purge overlapping labels).
- run_full_validation targets: PBO < 0.5, bootstrap 2.5%-Sharpe CI > 0, regime-stratified positive in ≥2 regimes (expect strong in risk-off/crisis, weak in low-vol risk-on). Treat DSR > 0.95 as aspirational on short ETF samples (see §7); require DSR computed with honest num_trials and report it rather than hard-gating.
- Stress windows (must run explicitly): 2008 (expect strongly positive — crisis alpha), 2011–2013 (expect mild drawdown — survivability check, max DD < ~15%), 2020 (expect flat-to-negative in Q1, recovery H2), 2022 (expect strongly positive — long energy/USD, short bonds). A design that is not positive in 2008 & 2022 is mis-specified.
- Report turnover and net Sharpe at 1/3/5/10 bps cost sensitivity.
5.2 ewmac_trend_v2 — Refined Carver EWMAC (the fixed carver_trend)
Rationale & persistence. Same trend premium as §5.1, expressed via moving-average crossovers instead of horizon returns. Included because (a) it reuses fully-built, tested machinery (compute_ewmac_forecast, combine_forecasts, estimate_forecast_diversification_multiplier), and (b) signal-rule diversification (return-based vs MA-based) adds stability to the trend core. It is ~0.8–0.9 correlated to tsmom_multi — treat it as an ensemble member, not separate capital.
Exact formulas. Reuse compute_ewmac_forecast(prices, fast, slow, vol_lookback=32) for slow pairs only {(16,64),(32,128),(64,256)}; F_i = combine_forecasts({...}, dm=FDM) ∈ [−20,20]; w_i = clip((F_i/20)·(σ_target/σ_i), −w_max, w_max); buffer per §4.5.
Data + wired? close, vol_*. Machinery already exists in pysystemtrade_patterns.py. New: the v2 class (slow pairs, buffering, diversified universe, turnover discipline) — do not re-enable the old carver_trend.
Entry/exit & sizing. As §5.1 (continuous, vol-targeted, buffered). Critical deltas from the quarantined version: (1) drop fast pairs (2,8)/(4,16); (2) monthly/weekly cadence + buffering, not daily always-on; (3) diversified universe, not commodity-only; (4) exclude contango-bleed names.
Expected gross/net Sharpe, turnover, capacity. ≈ tsmom_multi (gross 0.6–0.9, net 0.5–0.7). Turnover slightly higher than §5.1 unless using slow pairs + buffer. Capacity identical (same universe).
Correlation. ~0.80–0.90 to tsmom_multi (same premium, different lens); ~0 to the rest of the book.
Failure modes & decay. Same as §5.1 plus MA-crossover-specific whipsaw at regime turns; and the original cost/turnover trap if buffering is omitted. The −92% history is a standing warning: ship only with turnover discipline + cost fix.
OOS validation. Same plan as §5.1. Additionally: verify the v2 net Sharpe is materially > 0 on 2020–2024 (the window where the old version printed −1.63/−92%) before un-quarantining anything. Gate on PBO < 0.5 and positive rolling-WF OOS.
5.3 xsmom_multiasset — Cross-sectional (relative-strength) momentum
Rationale & persistence. Distinct from TSMOM: rank assets against each other and bet on relative strength (AMP 2013). Earns its keep when cross-asset dispersion is high and TSMOM is flat (e.g., everything range-bound but with clear leaders/laggards). Lower correlation to TSMOM than the EWMAC/breakout variants, so it is the genuine diversifier in the family — at the cost of thinner, noisier edge on a small ETF cross-section.
Exact formulas. At month-end t, over the cross-section:
1. Risk-adjusted, skip-month momentum (avoid high-vol selection bias per Liu 2020 / GRJMOM; skip last month for short-term reversal): m_i = (ret_252d − ret_21d)_i / σ_i.
2. Cross-sectional z-score: z_i = (m_i − mean_j m_j) / std_j m_j.
3. Long top tercile, short bottom tercile, weights ∝ z_i, scaled to dollar-neutral (Σ longs = −Σ shorts) and to per-name w_max.
4. Crash control (Barroso–Santa-Clara): scale the whole sleeve by min(1, σ_target_xs / σ̂_xs), where σ̂_xs = trailing 63-day realized vol of the sleeve's own return.
5. Panic-state short-leg gate (Daniel–Moskowitz): when vix_proxy_21d is in its top quintile and SPY 3-month trend < 0, cut the short-leg weight by 50–100% (the loser-leg rebound is where momentum crashes live). Uses already-wired vix_proxy_21d / SPY.
Data + wired? ret_{21,252}d, vol_63d (all wired); vix_proxy_21d + SPY trend for the gate (wired via macro-ETF features). compute_cross_sectional_features already builds xs_rank_ret_* — usable as a starting point. New: terciles, dollar-neutralization, crash/panic scaling.
Entry/exit & sizing. Rebalance monthly; enter top/bottom tercile, exit when a name leaves its tercile (buffer the tercile boundary to cut churn). Vol-targeted at the sleeve level; per-name w_max≈0.10.
Expected gross/net Sharpe, turnover, capacity. Gross: 0.4–0.7 on a diversified asset-class cross-section; lower (0.3–0.55 net) on our small ETF set — be honest, this is the thinnest edge in the family. Turnover: higher than TSMOM (rank churn) → buffering essential. Capacity: fine on Tier-1; the short leg is the constraint (ETF borrow is generally easy/cheap for SPY/QQQ/TLT/GLD but not guaranteed for thin names; on PAPER this is moot, on live confirm locate/borrow).
Correlation. ~0.1–0.4 to tsmom_multi (the real diversification benefit); negatively correlated to value-type sleeves; ~0 to the gold mean-reversion book.
Failure modes & decay. Momentum crashes (the loser leg rebounding violently in panic rebounds — 2009) → the panic gate + crash scaling are the mitigants and are non-negotiable here. Small-N noise: terciles of ~12 names are 4 names each → estimation noise; keep weights modest and prefer Tier-1+Tier-2 (≈18 names) before deploying. Crowding in equity momentum specifically.
OOS validation. As §5.1, plus an explicit momentum-crash stress: Mar–May 2009 and Q2 2020 rebounds — verify the panic gate prevents a catastrophic short-leg loss. Regime-stratified must show it is not purely a single-regime bet. Because XS adds genuine breadth, also report its marginal contribution to the combined sleeve's Sharpe (it should improve the combo even if its standalone Sharpe is modest — that is the AMP point).
5.4 breakout_donchian — Channel breakout (Donchian / Turtle / 52-week-high)
Rationale & persistence. A different entry geometry for the same premium: a new N-day (or 52-week) high means prior participants are in profit and overhead supply is exhausted — an ignition point (George–Hwang anchoring; 52-week-high returns notably do not reverse long-term). Parameter-insensitive (Donchian N∈[15,55] is a smooth surface), which lowers overfitting risk.
Exact formulas. Continuous channel position (smoother than a binary breakout):
c_i = clip( (close − mid_i) / (0.5·(highᴺ_i − lowᴺ_i)) , −1, +1 ), with mid_i = (highᴺ_i + lowᴺ_i)/2, N=55 (entry) and exit channel M=20. Or classic discrete: long when close > max(high, 55); exit when close < min(low, 20) (and symmetric short). Size w_i = clip(c_i·(σ_target/σ_i), −w_max, w_max); ATR stop at entry ± k·ATR (k≈2–3), ATR from hl_range_pct/true range.
Data + wired? high, low, close (present). hl_range_pct, avg_hl_range_21d (wired) for ATR. New: rolling channel max/min + ATR stop logic.
Entry/exit & sizing. Discrete entries/exits make this the natural home for stops and discrete risk-per-trade (Turtle: risk ≈ fixed fraction per trade). Vol-targeted size; ATR trailing stop for exits.
Expected gross/net Sharpe, turnover, capacity. Gross 0.4–0.6; net 0.4–0.55. Turnover lumpy (entry/exit driven) — discrete stops can raise it; the continuous channel form is smoother. Capacity = universe capacity (ample).
Correlation. ~0.7–0.85 to tsmom_multi (same trend, different trigger) → an ensemble/entry variant, not a diversifier. Most useful (a) as a robustness vote inside the trend core, or (b) on intraday bars (the daemon has an intraday sub-portfolio) where discrete breakouts + ATR stops fit naturally.
Failure modes & decay. False breakouts / whipsaw in ranges (classic 30–40% win rate; relies on winners ≫ losers) → volume confirmation (rel_volume_21d) and ATR stops help; gap risk through stops; parameter cliff if a single N is cherry-picked (scan 15–55, require a smooth surface — exactly what the validation harness's PBO is for).
OOS validation. As §5.1, plus a parameter-stability scan (N,M over grids) reported as a heatmap; reject if performance lives on an isolated (N,M) spike (PBO will catch it). Confirm win-rate ~30–45% with profit factor > 1 (trend shape) rather than a high-hit-rate fluke.
6. Portfolio construction & the trend_portfolio combiner (priority #2)
Individual strategies emit per-name Signals; the combiner is what turns ~0.4-per-asset signals into a ~0.7–0.9 sleeve and is what makes the sleeve cost-safe. It is the second-most-important deliverable after tsmom_multi.
Responsibilities:
1. Signal-rule ensemble: average the (normalized) forecasts of tsmom_multi (+ optionally ewmac_trend_v2, breakout_donchian) per name. Because they are 0.8–0.95 correlated, use equal weights (Carver: don't optimize correlated forecast weights) with a small FDM.
2. Instrument Diversification Multiplier: IDM = estimate_idm(returns_panel) (already implemented; capped 2.5). Multiply per-name weights by IDM so the portfolio hits σ_target (≈12%) given cross-asset correlations. This is the explicit lever that recovers the diversified Sharpe.
3. Portfolio vol target: scale gross so realized portfolio vol ≈ DEFAULT_VOL_TARGET (0.12); never exceed max_gross_leverage=2.0 / max_net_leverage=1.0.
4. Correlation netting: apply PRIMARY_BY_ALIAS (net GLD/IAU/SGOL, SLV/SIVR) and net opposing legs across ensemble members before routing, so we don't pay spread on offsetting positions (mirrors the existing aggregator).
5. Turnover buffer at portfolio level (§4.5) and respect MACRO_STRATEGY_REBALANCE_MAX_NOTIONAL_PCT=0.35.
6. Satellite blend: core = TSMOM ensemble (e.g., 80%), satellite = xsmom_multiasset (e.g., 20%) — AMP-style, sized by marginal Sharpe contribution, not equal capital.
Factor exposure declaration (for attribution): FactorExposure(momentum=1.0, market≈0 net, metals/real_rates/dollar = small, time-varying). Trend's net factor loadings rotate with positioning — declare momentum=1.0 and let attribution measure the rest.
7. Out-of-sample validation & stress plan (consolidated)
Pipeline (per strategy and for the combined sleeve):
1. Walk-forward, anchored + rolling, windows tuned for long lookbacks: train ≥ 756d, test 126d, step 126d, embargo ≥ 252d (embargo must cover the longest signal horizon to purge overlapping labels — the engine default of 5–21d is too small for a 12-month signal and will leak). Use BacktestEngine.run_walk_forward with overridden args.
2. CPCV / PBO (combinatorial_purged_cv): target PBO < 0.5 (the book's PBO_REJECT_THRESHOLD). Trend's payoff is lumpy → use ≥6 groups.
3. Deflated Sharpe (deflated_sharpe_ratio) with an honest num_trials = (#signal rules × #horizon sets × #vol estimators × #universes) you actually tried. Honesty note: DSR > 0.95 is a stiff bar for a true-Sharpe-≈0.6 effect on ~10–15y of ETF data; report DSR and don't hard-fail the sleeve solely on it — lean on PBO + bootstrap-CI + regime-stratification + the 130-year prior from Hurst (2017). State the residual overfitting risk explicitly.
4. Block bootstrap (monte_carlo_block_bootstrap, block 21): require 2.5%-pct Sharpe CI lower bound > 0.
5. Regime-stratified (regime_stratified_sharpe): positive in ≥2 regimes; expect strong risk-off/crisis, weak low-vol risk-on.
6. Capacity (Almgren–Chriss): trivially passes for Tier-1; report it for Tier-2 thin names.
7. Cost sensitivity: report net Sharpe & turnover at 0/1/3/5/10 bps — and re-run once the engine charges cost on turnover (the gating dependency from §2.3).
Mandatory stress scenarios (and the pass criteria that make the design falsifiable):
| Window | Environment | Expected trend behavior | Pass criterion |
|---|---|---|---|
| 2008 (GFC) | Sustained selloff, flight to bonds/USD | Strongly positive (crisis alpha; CTA +14% that year) | Sleeve return clearly positive; if not, design is broken |
| 2011–2013 ("CTA Winter") | Range-bound, central-bank-suppressed | Mild bleed | Max drawdown survivable (< ~15–20%); no blow-up |
| 2020 (COVID) | V-shaped crash + fast recovery | Flat-to-negative Q1, recover H2 | Drawdown bounded; recovers within ~2 quarters |
| 2022 | Energy bull, bond bear, USD bull | Strongly positive (SG Trend ≈ +25%) | Sleeve return clearly positive |
A trend design that does not make money in 2008 and 2022 is mis-specified; one that loses catastrophically (not just bleeds) in 2011–2013 is over-levered or over-trading.
8. Prioritized implementation roadmap
- Wire
trend_universe(Tier-1, ~12 names) — declare in the strategy or extendUNIVERSE; confirm feature store emitsret_*,vol_*,vol_adj_mom_*for each (no new feed). - Build
tsmom_multi(§5.1) with Rule-A signals (reusingvol_adj_mom_*), vol targeting, and buffering state — the single highest-value deliverable. - Build
trend_portfoliocombiner (§6) — IDM + portfolio vol target + netting + buffer. (Steps 2–3 are the minimum shippable sleeve.) - Validate end-to-end with §7 (especially the 2008/2022 stress and the turnover-cost rerun). Only then consider un-quarantining nothing — this is a new sleeve, not a revival.
- Add
ewmac_trend_v2(§5.2) as an ensemble member (reusing existing machinery); A/B its marginal stability contribution. - Add
xsmom_multiasset(§5.3) as a small satellite once Tier-1+Tier-2 (~18 names) is live; ship with the panic gate from day one. - Optionally add
breakout_donchian(§5.4) as an ensemble member or in the intraday sub-portfolio. - Yang–Zhang vol estimator (§4.1) as a turnover upgrade once v1 is validated.
9. Honest limitations & where the edge is thin
- Standalone Sharpe is modest. ETF-only, ~12-name, post-2010, after realistic costs: ~0.55–0.80 net for the flagship, possibly lower in a chop regime. This sleeve is justified by diversification + crisis alpha + positive skew, not by a high standalone Sharpe. Do not oversell it.
- It will have multi-year drawdowns (CTA Winter precedent). Anyone sizing it must be able to hold through 2–4 lean years; otherwise it gets cut at the bottom (the classic trend-following failure).
- The cross-sectional sleeve is genuinely thin on a small ETF set (noisy terciles, borrow constraints live). Treat as a satellite; its value is combination, not standalone.
- EWMAC and breakout are not diversifiers — they are 0.75–0.9 correlated to the flagship. They buy robustness, not independent alpha; don't double-count their capacity.
- Cost & turnover are the make-or-break. The current engine's gross-weight cost model will show false negatives (it already sank
carver_trend). The sleeve is only honestly evaluable once cost is charged on turnover; design assumes that fix. - Contango-bleed and leveraged ETFs break naive trend (roll decay, vol-decay) — excluded/penalized; energy legs need roll-awareness.
- Crowding/decay risk: trend is a heavily-allocated industry factor; forward Sharpe is likely below the 130-year history. Monitored via
DecayMetrics(quarantine if 24m Sharpe < 30% of in-sample, perDECAY_QUARANTINE_THRESHOLD). - DSR > 0.95 may not be achievable on the available sample for a true-≈0.6 effect; we lean on PBO, bootstrap CI, regime breadth, and the strong external 130-year prior, and we state the residual risk rather than hide it.
10. References
- Moskowitz, Ooi, Pedersen (2012). "Time Series Momentum." J. Financial Economics 104(2): 228–250. (Diversified TSMOM Sharpe ≈ 1.1; 12-month horizon; vol-scaling; TSMOM smile.)
- Hurst, Ooi, Pedersen (2017). "A Century of Evidence on Trend-Following Investing." J. Portfolio Management (AQR). (1880–2016, 67 markets; 1m/3m/12m blend; per-asset Sharpe ≈0.4, net ≈0.77; 8/10 crisis drawdowns.)
- Hurst, Ooi, Pedersen (2013). "Demystifying Managed Futures." J. Investment Management 11(3): 42–58.
- Baltas, Kosowski (2013/2020). "Demystifying / Improving Time-Series Momentum Strategies." (Yang–Zhang vol −17% turnover; continuous TREND rule −24%; regression-slope −⅔; costs ≈10% of gross.)
- Asness, Moskowitz, Pedersen (2013). "Value and Momentum Everywhere." J. Finance 68(3): 929–985. (XS momentum; value–momentum corr ≈ −0.60; 50/50 global Sharpe ≈1.42.)
- Daniel, Moskowitz (2016). "Momentum Crashes." J. Financial Economics 122(2): 221–247. (Forecastable panic-state crashes; dynamic scaling ~doubles Sharpe.)
- Barroso, Santa-Clara (2015). "Momentum Has Its Moments." J. Financial Economics. (Realized-variance scaling ~eliminates crashes, ~doubles Sharpe.)
- Harvey, Hoyle, Korgaonkar, Rattray, Sargaison, van Hemert (2018). "The Impact of Volatility Targeting." J. Portfolio Management. (60+ assets; risk-asset Sharpe lift; tail/drawdown reduction everywhere; 10% target.)
- George, Hwang (2004). "The 52-Week High and Momentum Investing." J. Finance. (52-week-high dominates past-return momentum; no long-run reversal.)
- Carver, R. (2015) Systematic Trading & (2019) Leveraged Trading (Harriman House); pysystemtrade. (EWMAC, forecast scaling/combination, FDM, IDM, vol targeting, buffering — already partly implemented in
qgtm_strategies/pysystemtrade_patterns.py.) - Lopez de Prado (2018). Advances in Financial Machine Learning, Ch. 7 & 12 (purged/embargoed CV, CPCV, PBO); Bailey & Lopez de Prado (2014) "Deflated Sharpe Ratio"; Harvey, Liu, Zhu (2016) multiple-testing — all implemented in
qgtm_backtest/validation.py. - Liu (2020), "Momentum and the Cross-section of Stock Volatility" (risk-adjusted ranking to avoid high-vol selection bias — motivates the
m_i = (ret−ret_1m)/σform in §5.3).
End of spec. Implementation, code, and any repo/universe edits are intentionally deferred to a separate engineering task per the read-only research mandate.