Statistical Tests for Mean Reversion (Stationarity)

ADF, KPSS, Variance Ratio, and the Practical Issues You Actually Run Into

1. Context and the Basic Distinction

Let me start with the obvious thing that people think they understand until they actually try to trade it: a random walk is very different from something that is mean reverting.

A random walk wanders. If it goes up for a while, it can stay up for a long time. If it crosses the mean again, that can just be chance. Importantly, being far from the mean does not create a stronger pull back.

A mean reverting process is centered around some level, and the further away it gets, the stronger the expected pull back. This is the difference between something that crosses the mean rarely and something that crosses it many times.

The classic econometric framing is: many of the statistics you want to use require stationarity. If something is a random walk, you typically difference it first. If it is already stationary, you can regress it, fit AR models, and so on, without everything falling apart.

Two canonical toy models:

Random walk (unit root).

y_{t} = y_{t - 1} + ε_{t}, ε_{t} \sim (0, σ^{2}) .

Mean reversion (AR(1), |p|<1)

y_{t} = ρ y_{t - 1} + ε_{t}, | ρ | < 1.

The whole point of the tests below is to distinguish “ p = 1” from “ p < 1” in a statistically disciplined way.

2. Augmented Dickey–Fuller (ADF) Test

The ADF test is the standard workhorse. Conceptually it is very simple: we regress changes on lagged levels and ask whether the level term has a negative coefficient.

Start from an AR(1):

y_{t} = ρ y_{t - 1} + ε_{t} .

Subtract y_t-1:

Δ y_{t} = (ρ - 1) y_{t - 1} + ε_{t} .

Define $β := ρ - 1$ , so

Δ y_{t} = β y_{t - 1} + ε_{t} .

If $ρ = 1$ then $β = 0$ , and you have:

Δ y_{t} = ε_{t},

which is exactly the “increments are white noise” property of a random walk.

The augmented form adds lagged differences to mop up autocorrelation in the residuals:

Δ y_{t} = β y_{t - 1} + \sum_{i = 1}^{p} γ_{i} Δ y_{t - i} + ε_{t} .

2.1. Hypotheses

ADF is usually written as:

H_{0} : β = 0 (unit root / random walk), H_{1} : β < 0 (stationary / mean reverting) .

2.2. The part people get wrong: it is not a Student-t test

Even though the regression looks like a normal regression, the test statistic under H₀ does not follow a Student t distribution. It follows a Dickey–Fuller type distribution. That is why ADF uses special critical values.

If you treat it like a standard t-test, you are going to fool yourself.

2.3. Choosing the lag length p

You have to choose p. Different p gives you a different test. The standard practice is to pick p by minimizing an information criterion such as AIC or BIC.

AIC.

A I C = - 2 \log L + 2 k .

BIC.

B I C = - 2 \log L + k \log T .

Here L is the likelihood, k is the number of estimated parameters, and T is sample size. AIC and BIC are both “goodness of fit minus a penalty.” They are conceptually similar to how people think about cross-validation: you do not just maximize fit, you penalize complexity.

You can pick lags visually using ACF/PACF rules of thumb. People do that. But if you’re running systematic tests, letting the routine pick p by AIC/BIC is a perfectly reasonable default.

2.4. Python example (ADF with autolag AIC)

import numpy as np
from statsmodels.tsa.stattools import adfuller

np.random.seed(0)

# Example 1: random walk
y_rw = np.cumsum(np.random.randn(2000))

# Example 2: AR(1) mean reverting
eps = np.random.randn(2000)
rho = 0.95
y_ar = np.zeros_like(eps)
for t in range(1, len(eps)):
    y_ar[t] = rho * y_ar[t-1] + eps[t]

for name, y in [("Random Walk", y_rw), ("AR(1) Mean Reverting", y_ar)]:
    stat, pval, used_lag, nobs, crit, icbest = adfuller(y, autolag="AIC")
    print("\n---", name, "---")
    print("ADF stat:", stat)
    print("p-value:", pval)
    print("lags used:", used_lag)
    print("nobs:", nobs)
    print("crit:", crit)

In trading I am not religious about the 5% line. I care whether the series is clearly mean reverting or it is basically a random walk. A p-value that barely tips under a threshold is not the same thing as a statistic that is strongly on one side.

3. Why the t-statistic fails in time series unit-root regressions

In a textbook regression, you write:

t = \frac{\hat{β}}{S E (\hat{β})}

and you compare to a Student-t distribution.

In unit root settings, the asymptotics are different. Under the unit root null, the process behaves like Brownian motion in the limit. The distribution of the statistic depends on functionals of Brownian motion, not a Student-t.

One way to write the limiting idea (schematically) is that you get ratios of integrals involving Brownian motion W (.):

\frac{\int_{0}^{1} W (r) d W (r)}{\int_{0}^{1} W (r)^{2} d r},

which is not the same object as the usual regression t-statistic limit.

So: use the Dickey–Fuller critical values. That is the point of the test.

4. KPSS: Reversing the Null Hypothesis

KPSS is useful because it flips the null and the alternative compared to ADF.

ADF says: assume unit root, try to reject it.
KPSS says: assume stationarity, try to reject it.

The model can be written as:

y_{t} = μ_{t} + ε_{t}, μ_{t} = μ_{t - 1} + u_{t} .

Var (u_{t}) = 0

, then

μ_{t}

is constant and

y_{t}

is stationary around a mean. If

Var (u_{t}) > 0

, then

μ_{t}

wanders and you have a unit root type behavior.

Operationally, KPSS is built from residual partial sums. Regress

y_{t}

on a constant (or constant+trend), take residuals

e_{t}

, define partial sums:

S_{t} = \sum_{i = 1}^{t} e_{i} .

Then the statistic is:

K P S S = \frac{1}{T^{2} {\hat{σ}}^{2}} \sum_{t = 1}^{T} S_{t}^{2},

where

{\hat{σ}}^{2}

is an estimate of the long-run variance.

4.1. Hypotheses

H_{0} : stationary, H_{1} : unit root / non-stationary .

4.2. Python example (KPSS)

from statsmodels.tsa.stattools import kpss

# Using the same y_rw and y_ar from earlier
for name, y in [("Random Walk", y_rw), ("AR(1) Mean Reverting", y_ar)]:
    stat, pval, lags, crit = kpss(y, regression="c", nlags="auto")
    print("\n---", name, "---")
    print("KPSS stat:", stat)
    print("p-value:", pval)
    print("lags used:", lags)
    print("crit:", crit)

How I actually use it.

I like ADF and KPSS together because they are complementary. If ADF fails to reject a unit root and KPSS rejects stationarity, you are not looking at mean reversion. If both are ambiguous, you are probably in the grey zone, and you should stop pretending a binary label is going to save you.

5. Variance Ratio Test (Lo–MacKinlay)

Variance ratio is more intuitive than people give it credit for. It is based on scaling properties of a random walk:

y_{t}

is a random walk, then:

Var (y_{t + k} - y_{t}) = k Var (y_{t + 1} - y_{t}) .

Define:

V R (k) = \frac{Var (y_{t + k} - y_{t})}{k Var (y_{t + 1} - y_{t})} .

Interpretation:

V R (k) = 1 \Rightarrow random walk, V R (k) < 1 \Rightarrow mean reversion / negative autocorrelation, V R (k) > 1 \Rightarrow trend / momentum .

The reason I like this test is that you can compute it across multiple horizons k and see the structure. It is a very visual diagnostic.

5.1. Python example (simple variance ratio across horizons)

import numpy as np

def variance_ratio(y, k):
    y = np.asarray(y)
    dy1 = np.diff(y)
    dyk = y[k:] - y[:-k]
    return np.var(dyk, ddof=1) / (k * np.var(dy1, ddof=1))

ks = [2, 5, 10, 20, 50]

for name, y in [("Random Walk", y_rw), ("AR(1) Mean Reverting", y_ar)]:
    print("\n---", name, "---")
    for k in ks:
        print(f"k={k:>3d}  VR={variance_ratio(y, k):.4f}")

In practice, I care less about whether V R(k) is “statistically significant” at 5% and more about how far it is from one. If it is barely below one, it is usually not worth trading. If it is meaningfully below one, you often see better risk-adjusted behavior. This is not magic: stronger negative autocorrelation is a stronger economic effect.

6. Lag Length Choice: AIC, BIC, PACF (and why it matters)

In ADF, the whole “augmented” part is the lagged differences:

\sum_{i = 1}^{p} γ_{i} Δ y_{t - i} .

If you use too few lags, you leave autocorrelation in residuals and contaminate the test. If you use too many, you eat degrees of freedom and lose power.

There are multiple ways to pick p:

AIC/BIC: automated, objective-ish, and widely used in practice.
PACF: common visual heuristic for AR order selection; people do this.

I would not pretend there is a universal “correct” p. If your series is high frequency, your effective dynamics are different from daily. Use AIC/BIC as a default, and if you are building a strategy, treat p as a hyperparameter you stress-test.

7. Putting It Together: How I Would Actually Screen Mean Reversion

If you are screening a spread or a residual for mean reversion:

Run ADF (unit root null). Check whether you can reject.
Run KPSS (stationarity null). Check whether it rejects stationarity.
Compute variance ratios across horizons. Ask: is it meaningfully below 1?

If the outputs disagree, that is information, not a nuisance. It usually means you are in a borderline case where you should not be overly confident.

8. Final Remarks (from trading reality)

A few points that matter more than people admit:

These tests tell you what the data looked like. They do not guarantee persistence.
A series can be mean reverting in one period and then stop. Mean reversion is often episodic.
Statistical significance is not the same thing as tradability. Transaction costs and turnover matter.
Do not treat a p-value as a trading signal. Use it as a filter, then backtest the whole pipeline.

And the last thing: the reason we do this is not because statistics make money. We do this because statistics stop us from fooling ourselves too easily.

Other articles by Quant Insider include:

Join The Conversation

For specific platform feedback and suggestions, please submit it directly to our team using these instructions.

If you have an account-specific question or concern, please reach out to Client Services.

We encourage you to look through our FAQs before posting. Your question may already be covered!

Visit IBKR.com Open an IBKR Account

Leave a Reply Cancel reply

Disclosure: Interactive Brokers Third Party

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from Quant Insider and is being posted with its permission. The views expressed in this material are solely those of the author and/or Quant Insider and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

How much could you save on your margin loan by switching to Interactive Brokers?

Fill out the information below to see your estimated savings.

Current Interest Rate

Balance

USD

Margin Amount Borrowed

USD

Time Margin is Borrowed

IBKR will assess a surcharge of 1% on large loan balances unless otherwise prearranged with IBKR. The 1% surcharge would apply to all balances in the highest tier.

The interest calculator is based on information that we believe to be accurate and correct, but neither Interactive Brokers LLC nor its affiliates warrant its accuracy or adequacy and it should not be relied upon as such. Neither IBKR nor its affiliates are responsible for any errors or omissions or for results obtained from the use of this calculator.

Restrictions apply. Annual Percentage Rate (APR) on USD margin loan balances for IBKR Pro as of October 3, 2024. Interactive Brokers calculates the interest charged on margin loans using the applicable rates for each interest rate tier listed on its website. Learn more about margin loan rates.

The projections or other information generated by the Interest Calculator tool are hypothetical in nature, do not reflect actual results and are not guarantees of future results. Please note that results may vary with use of the tool over time.

Trading on margin is only for experienced investors with high risk tolerance. You may lose more than your initial investment. For additional information about rates on margin loans, please see Margin Loan Rates.

Master options fundamentals with our new Interactive Learning course

Statistical Tests for Mean Reversion (Stationarity)

ADF, KPSS, Variance Ratio, and the Practical Issues You Actually Run Into

1. Context and the Basic Distinction

2. Augmented Dickey–Fuller (ADF) Test

2.1. Hypotheses

2.2. The part people get wrong: it is not a Student-t test

2.3. Choosing the lag length p

AIC.

BIC.

2.4. Python example (ADF with autolag AIC)

3. Why the t-statistic fails in time series unit-root regressions

4. KPSS: Reversing the Null Hypothesis

4.1. Hypotheses

4.2. Python example (KPSS)

5. Variance Ratio Test (Lo–MacKinlay)

5.1. Python example (simple variance ratio across horizons)

6. Lag Length Choice: AIC, BIC, PACF (and why it matters)

7. Putting It Together: How I Would Actually Screen Mean Reversion

8. Final Remarks (from trading reality)

Join The Conversation

Leave a Reply Cancel reply

Disclosure: Interactive Brokers Third Party

Information on Other Interactive Brokers Affiliates

Interactive Brokers Canada Inc.

Interactive Brokers Australia Pty. Ltd.

Interactive Brokers Hong Kong Limited

Interactive Brokers India Pvt. Ltd.

Interactive Brokers Securities Japan Inc.

Interactive Brokers Singapore Pte. Ltd.

IBKR Campus Log In

Master options fundamentals with our new Interactive Learning course

ADF, KPSS, Variance Ratio, and the Practical Issues You Actually Run Into

1. Context and the Basic Distinction

2. Augmented Dickey–Fuller (ADF) Test

2.1. Hypotheses

2.2. The part people get wrong: it is not a Student-t test

2.3. Choosing the lag length p

AIC.

BIC.

2.4. Python example (ADF with autolag AIC)

3. Why the t-statistic fails in time series unit-root regressions

4. KPSS: Reversing the Null Hypothesis

4.1. Hypotheses

4.2. Python example (KPSS)

5. Variance Ratio Test (Lo–MacKinlay)

5.1. Python example (simple variance ratio across horizons)

6. Lag Length Choice: AIC, BIC, PACF (and why it matters)

7. Putting It Together: How I Would Actually Screen Mean Reversion

8. Final Remarks (from trading reality)

Join The Conversation

Leave a Reply Cancel reply

Disclosure: Interactive Brokers Third Party

Bi-Weekly Newsletter

Daily Newsletter

Weekly Newsletter

Weekly Newsletter

Monthly Newsletter