Methodology8 min read · May 13, 2026

A quant's critique of our statistics — what's valid, what's not, and one bug we fixed

GPT-5.5 ran a hostile review of the Vectra statistics page. One display bug showed 666% max drawdown (fixed). Six valid criticisms accepted without hedging. One misread refuted with portfolio math.

GPT-5.5 recently ran a structured review of the Vectra statistics page. I asked it to be hostile. It found real issues, some misconceptions, and one genuine bug. Here is my response, point by point — no hedging on the things it got right.

Summary

#	Critique	Verdict	Sev	Response
1	MC Max DD P50 showed 666%	Fixed bug	Critical	Display layer multiplied an already-percentage value by 100. Actual P50 = 6.66%, P95 = 9.83%
2	Equities sleeve Sharpe 0.14 looks broken	Valid	Medium	Correct per-market number; understates contribution — the slot hosts the exposure overlay
3	drop_top_5pct_winners collapses Sharpe to 2.53	Valid	Medium	Expected for trend/momentum; convex tail exposure, not a flaw
4	No live track record	Valid	High	Acknowledged; confidence ceiling is explicitly capped
5	Execution assumptions are optimistic	Valid	Medium	Documented: 6 bp crypto, 5 bp equities, 1.5 bp FX; breaks down above ~$500k crypto notional
6	Survivorship bias in SP100 universe	Valid	Medium	2026 constituents applied to 2018 data; estimated +10–30 pp return inflation on equities sleeve
7	No capacity analysis published	Valid	Low–Medium	Crypto sleeve capacity roughly $200–500k before cost assumptions break
8	"Sharpe 4.2 is extraordinary for one strategy"	Misread	None	Three near-zero-correlation streams; portfolio math, not Medallion
9	Volatility targeting as hidden smoothing	Partial	Low	Exposure overlay is documented; yes it compresses vol, which mechanically lifts Sharpe

The one genuine bug

The Monte Carlo max drawdown display showed P50 = 666.04%. This is wrong by a factor of 100.

The actual numbers: P50 = 6.66%, P95 = 9.83%. These are consistent with the full-period max DD of 6.30%. What happened: the web component rendering the MC results received a decimal (0.0666) from the API, then multiplied by 100 twice — once in the data layer and once in the display formatter. The fix was a one-line correction. The underlying simulation numbers were always correct; only the rendering was wrong.

Fixed 2026-05-13. If you screenshotted the statistics page before this date and saw "Max DD p50: 666%", that was a display bug. The Monte Carlo simulation result was correct throughout.

Valid criticisms — accepted without hedging

The equities sleeve number looks weak — and it is, in isolation

The per-market equities numbers (Sharpe 0.14, return 4.4%, DD 23.47%) are accurate. Vol-momentum on large-cap equities is weak. The strategy trades a slow cross-sectional signal on 100 names that are all driven by the same macro factor.

What the raw number doesn't capture: the equities slot in the portfolio also hosts the exposure overlay. When the vol-mom rolling Sharpe falls below a threshold — roughly 49.9% of bars — the system routes capital into the complement basket instead: equity cross-sectional momentum (fc=5, rebal=3) and FX trend (fc=3, rebal=3, 3× leverage). That complement basket has better characteristics than raw vol-mom. Reporting the two separately and calling one number "equities" obscures this. The fix is to restructure the statistics page to show stream contributions separately from the overlay routing breakdown. That work is on the backlog.

drop_top_5pct_winners collapses Sharpe to 2.53

True and expected. Trend-following and momentum strategies have convex return distributions — a small number of large winning trades contribute disproportionately to total P&L. When you strip the top 5% of winning trades, you remove the tail that the strategy is designed to capture. The honest interpretation: this is a feature, not a bug, but it means execution quality on entry matters enormously. Missing the entry on a trade that becomes a 15-sigma move costs far more than missing a median trade. I do not present the drop_top_5pct result as reassuring. I present it as a stress test that quantifies tail dependence.

No live track record

Vectra's crypto sleeve has been live on MEXC since late 2024. The equities and FX sleeves are wired for paper trading. A backtest — no matter how carefully constructed — is not a live track record. Sharpe 4.24 in backtest with DSR probability 1.000 and permutation p-value 0.000 is strong evidence against randomness. It is not a guarantee of live performance. I will not pretend otherwise.

Execution assumptions

The backtest assumes: 6 bps per side for crypto (MEXC taker fee), 5 bps per side for equities (IBKR commission plus market impact estimate), 1.5 bps per side for FX (OANDA major spread). Fills at bar close. Daily rebalance with no price impact beyond the per-trade cost.

These assumptions are conservative relative to zero-cost backtests. They are optimistic in two ways: (1) bar-close fills assume liquidity is always available at that price, which breaks down above some notional; (2) they do not model bid-ask bounce on partial fills. At current live notional (~$30k crypto), the 6 bps assumption is approximately correct. Above ~$500k crypto notional, true cost on large moves in smaller symbols will exceed 6 bps.

Survivorship bias in the SP100 universe

The backtest uses the current 2026 S&P 100 constituents applied to 2018–2024 data. Companies removed from the index due to underperformance or bankruptcy are not in the universe. My estimate of the inflation: +10 to +30 pp on total equities return, +0.05 to +0.15 on equities Sharpe. Crypto and FX sleeves are not affected. Given the equities sleeve already shows Sharpe 0.14, the survivorship-adjusted estimate may be near zero or negative. This is a known limitation I should have noted from the start. A footnote is now on the statistics page.

Capacity analysis

Not yet published. The crypto sleeve degrades fastest — MEXC futures on smaller-cap assets have limited liquidity. A rough estimate: the crypto strategy at current signal strength is capacity-limited to perhaps $200–500k notional before cost assumptions become materially wrong. The equities and FX sleeves scale further but the base returns are lower. This analysis needs to be done and published before any serious capital allocation.

Invalid: "Sharpe 4.2 is extraordinary for one strategy"

This is a misread of the architecture. The combined portfolio Sharpe of 4.237 is not the Sharpe of a single strategy. It is the Sharpe of a three-stream portfolio where individual stream Sharpes are: crypto 2.13, FX 0.74, equities 0.14.

The combined number exceeds every individual stream because cross-market correlations are near zero (crypto ↔ equities: −0.017; crypto ↔ FX: +0.017; equities ↔ FX: +0.040). For streams with pairwise correlations ρ ≈ 0, the equal-weighted portfolio Sharpe is approximately:

Portfolio Sharpe ≈ √(Σ Sharpeᵢ²)
                 = √(2.13² + 0.74² + 0.14²)
                 = √(4.537 + 0.548 + 0.020)
                 = √5.105
                 ≈ 2.26  (equal-weight lower bound)

The actual measured Sharpe of 4.237 exceeds this lower bound because the meta-allocator overweights the crypto stream (highest Sharpe), and the exposure overlay captures additional diversification within the equities slot. A Sharpe of 2.13 for a single crypto momentum strategy would be extraordinary if it survived live. A portfolio Sharpe of 4.2 from combining three weakly-correlated streams is expected portfolio mathematics.

Partially valid: the exposure overlay does compress realized vol

The exposure overlay is documented and intentional dynamic exposure scaling — the system reduces gross exposure when rolling Sharpe falls below a threshold and routes to the complement basket. This reduces realized volatility during drawdown periods, which mechanically improves Sharpe versus a fixed-weight strategy that holds at full size through drawdowns. This effect is real and should be stated plainly: part of the Sharpe improvement vs a naive vol-mom strategy is attributable to the overlay.

The 11/11 walk-forward result, OOS/IS Sharpe ratio of 0.956, and permutation p-value of 0.000 are the evidence against the overlay being overfitted to specific historical regime transitions. They are not proof — nothing in backtesting is.

Honest confidence calibration

Before 12 months of live data: The backtest evidence is strong enough to warrant live deployment with limited capital. The statistical tests clear. The OOS Sharpe of 4.05 on held-out 2022–2024 data — including the 2022 bear and FTX collapse — is not consistent with random coincidence. But backtests cannot account for regime shifts, live execution frictions, or the behavioral response to a real drawdown. I am comfortable running crypto live. I would not allocate institutional capital without a track record.

After 6 months of live data across all three sleeves: If the live combined Sharpe tracks within 1 sigma of backtest expectations, that provides meaningful confirmation. Call it one confirming data point, not validation.

After 12 months of live data: A full live calendar year, multiple market regimes, real fills, real slippage. That is when I would consider this a validated system rather than a promising backtest. The statistics page will be updated as live data accumulates.

The headline numbers are real. The caveats are real. Both belong on the page.

Published by Floris V. · Vectra operator

May 13, 2026

Join the waitlist →

Older

Operations

Iter 20: the dead-zone filter that lifted CAGR 0.78pp