- Solve real problems with our hands-on interface
- Progress from basic puts and calls to advanced strategies

Posted October 15, 2025 at 10:57 am
The article “Why You Can’t Tell if Your Strategy “Stopped Working” (Statistically Speaking)” was originally posted on Robot Wealth blog.
Traders love the illusion of precision. A few bad weeks go by, and you think, “Let’s run a t-test and see if the strategy stopped working.” It sounds rigorous. It isn’t.
Imagine a strategy that, in truth, earns 10% per year with 20% volatility – roughly the S&P’s long-term profile. We’ll simulate five years of daily returns, about 1,260 observations, from a geometric Brownian motion with those parameters.
Now the world changes. For the next month, 21 trading days, the strategy’s expected return drops to zero, but volatility stays at 20%.
We’d like to detect that change. The question: Can you statistically prove the edge is gone?
A t-test compares these two samples (before and after) and asks if their means differ significantly.
Five years of data vs. one month.
That’s n₁ ≈ 1260 vs. n₂ ≈ 21.
Noise ≈ 20% / √252 ≈ 1.26% daily.
The expected daily drift for 10% annual return is just 0.10 / 252 ≈ 0.04% per day. That’s our signal. The noise is thirty times larger. In other words, your daily Sharpe ratio is 0.04 / 1.26 ≈ 0.03, an astronomically low signal-to-noise ratio.
So even if the edge disappeared entirely, you’d barely notice in a month.
From the simulation:
| Test | p-value | Interpretation |
| Welch t-test | 0.12 | Not significant |
| Kolmogorov Smirnov test | 0.37 | Not significant (even worse) |
The t-test finds nothing. The KS test, which looks at the whole distribution, not just the mean, finds even less. The supposed “collapse in performance” doesn’t even register as a blip in the statistics.
That’s the problem: volatility dominates everything. The mean shift you’re trying to detect (0.04%/day) is microscopic relative to the daily noise (±1%). Twenty-one days is simply not enough data to estimate a mean that small with any precision.
The t-stat came out around –1.6, roughly a 12% chance under the null of equal means. Even if you doubled the sample size – two months of underperformance – the p-value would still hover around 0.06. You’d need a multi-month drought before statistics would admit the obvious.
The funny part: in that same run, the “dead” strategy’s one-month realised return was +7.5%.
That’s right. A zero-drift month beat 93% of all months during the prior five years of positive-drift data.
Why?
A 20% annual volatility means roughly 5.8% standard deviation per month.
Even with zero drift, one standard deviation up is +5.8%, and 1.3σ is +7.5%. That happens 10% of the time purely by chance.
Meanwhile, in the “good” regime (10% annual drift), the expected monthly gain is only +0.8%. A +7.5% outlier month easily beats the vast majority of historical months.
So, you end up with a bizarre headline:
“Our strategy just lost its edge, but had its second-best month ever.”
Noise does that.
To see if the phenomenon was a fluke, I ran 3,000 simulations of the same setup.
Across runs:
So, one in every thirteen “dead” months looks like a top-decile success. Statistically, that’s unremarkable. Psychologically, it’s devastating – because you’ll tell yourself the system recovered, tweak nothing, and (probably) then spend the next quarter losing money.
Some traders, knowing the t-test’s weakness, pivot to “non-parametric” tests like the Kolmogorov-Smirnov. It compares the cumulative distributions directly, not the means. Surely that’s more robust?
No.
When two normal distributions differ only slightly in mean but have the same variance, the KS test has less power than the t-test. It’s designed to catch shape differences – fat tails, variance shifts, asymmetry – not small mean drifts. With n₂ = 21, it’s practically blind.
In our case, the KS p-value was 0.37. The test confidently says “nothing to see here.” It’s technically correct.
Here’s the deeper problem. The tools we use, t-tests, p-values, Sharpe ratios, were designed for large-sample, low-noise situations. Financial returns are the opposite: small signals, fat tails, short samples.
When you apply a test that needs thousands of points to reject the null at 95% confidence, you’ll never detect regime shifts in real time. The market will move on long before the statistic does.
A five-year window may contain your “true” performance, but it’s useless for diagnosing the present. A one-month drought tells you nothing. A three-month one tells you almost nothing.
The conclusion isn’t that tests are bad—it’s that the problem is mis-specified. The null hypothesis “mean return hasn’t changed” is almost never the right one. Markets evolve, but slowly and noisily. No binary test will save you.
Practical Interpretation
When a strategy underperforms for a few weeks, you face two equally dangerous errors:
Classical statistics tries to balance those. Trading doesn’t care. You’re asymmetric: Type II errors cost you more because capital decays geometrically, not linearly.
So, the sensible response isn’t to chase significance but to control exposure. Cut risk when the environment looks hostile, but don’t fool yourself that a p-value will tell you when to quit.
If you want to formalise this intuition, think Bayesian: update your belief about the strategy’s drift each day. The posterior distribution will drift toward zero if recent returns are weak, but uncertainty will remain large. The proper decision rule is probabilistic, not binary.
Even better, you can encode prior scepticism – say, most strategies decay over time – and let data modify that belief. The output isn’t “dead or alive,” but “probability the drift > 0.” You can then size down continuously rather than panic after a failed t-test.
But that’s another post.
Every quantitative trader eventually learns this the hard way: statistics are lagging indicators. They confirm what you already know, long after it’s actionable.
A strategy doesn’t announce its death with a p-value. It fades, subtly, while your t-statistic wobbles somewhere between 0.5 and 1.2. By the time a 5-year backtest fails a significance test, you’ve lost more in opportunity cost than you saved by being “rigorous.”
Markets are too noisy for clean statistical detection. The right question isn’t “Has my edge stopped working?” but “Given recent evidence, how much do I trust it now?”
The answer is always probabilistic, never definitive, which is precisely why trading is hard – and why so many seek comfort in meaningless tests.
Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from Robot Wealth and is being posted with its permission. The views expressed in this material are solely those of the author and/or Robot Wealth and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.
Join The Conversation
For specific platform feedback and suggestions, please submit it directly to our team using these instructions.
If you have an account-specific question or concern, please reach out to Client Services.
We encourage you to look through our FAQs before posting. Your question may already be covered!