- Solve real problems with our hands-on interface
- Progress from basic puts and calls to advanced strategies

Posted December 27, 2023 at 10:16 am
Updated by Chainika Thakar (Originally written by Devang Singh)
Time series data is a unique and invaluable form of data that captures information over a continuous period. It’s used in various fields, from finance to economics, to understand and predict trends, patterns, and behaviours.
Among the essential tools for analysing time series data is the Johansen Cointegration Test, which plays a pivotal role in understanding relationships between variables. This blog aims to provide a comprehensive and beginner-friendly guide to mastering the Johansen Cointegration Test using Python.
We’ll embark on this journey by first understanding the core concepts of time series data. What makes it different from other types of data, and how do we extract meaningful insights from it?
In this blog post, you will understand the essence of the Johansen Test for cointegration and learn how to implement it in Python. Another popular test for cointegration is the Augmented Dickey-Fuller (ADF) test. The ADF test has limitations which are overcome by using the Johansen test.
The ADF test enables one to test for cointegration between two-time series. The Johansen Test can be used to check for cointegration between a maximum of 12-time series.
This implies that a stationary linear combination of assets can be created using more than a two-time series, which could then be traded using mean-reverting strategies like Pairs Trading, Triplets Trading, Index Arbitrage and Long-Short Portfolio.
Whether you’re a novice or an aspiring data analyst, this blog will empower you to harness the potential of time series data with the Johansen Cointegration Test.
Some of the concepts covered in this blog are taken from this Quantra course on Mean Reverting Strategies in Python by Dr. E P Chan. You can take a Free Preview of the course.
This blog covers:
The Johansen Cointegration Test is a statistical procedure used to analyse the long-term relationships between multiple time series variables. Time Series is a sequence of observations over time, which are usually spaced at regular intervals. For example, daily observed prices of the stocks, bonds etc. over a period of 10 years, 1 minute stock price data for the last 100 days etc.
The Johansen Cointegration Test is a valuable tool for economists, financial analysts, and researchers to assess the relationships between multiple time series variables and make informed decisions based on their long-term behaviour.
Key properties of the Johansen Cointegration Test include:

The Johansen Cointegration Test holds significant importance in the fields of econometrics, finance, and time series analysis for several key reasons:

In summary, the Johansen Cointegration Test is a valuable tool for analysing the long-term relationships between time series variables, providing insights into economic and financial dynamics, portfolio management, and policy analysis, among other applications. Its ability to handle multivariate data makes it a versatile and indispensable technique for researchers and practitioners in these fields.
Cointegration, a concept in time series analysis, is especially useful in the world of trading and forecasting. It helps traders and analysts make better predictions and strategic decisions.
Here’s how it works:
This Python code aims to perform the Johansen Cointegration Test for multiple stock pairs, shedding light on their long-term relationships and potential trading strategies.
The pairs of stocks in the code are:
We will find out if each pair is cointegrated or not on the basis of “Testing for Zero Cointegrating Relationships (Null Hypothesis)”. This means that the null hypothesis will be rejected when a pair of stocks is cointegrated.
Let us begin with the code now.
# Import libraries import numpy as np import pandas as pd from pandas_datareader import data as pdr import yfinance as yf from statsmodels.tsa.vector_ar.vecm import coint_johansen
Libraries.py hosted with ❤ by GitHub
Now, we will fetch data for three stocks.
# Set the stock list stock_list = ['AAPL', 'AMZN', 'NFLX'] # Set the start date and the end date start_date = '2010-01-01' end_date = '2022-10-16'
Stock_list.py hosted with ❤ by GitHub
Now, we will extract the trace statistics and eigen statistics. These statistics are the key components of the Johansen Cointegration Test. We will discuss them later after the output is generated.
# Create a data frame df to store the two-time series
data = yf.download(stock_list, start=start_date, end=end_date)['Adj Close']
# Perform the Johansen Cointegration Test with a specified number of zero
specified_number = 0 # Testing for zero cointegrating relationships
coint_test_result = coint_johansen(data, specified_number, 1)
# Extract the trace statistics and eigen statistics
trace_stats = coint_test_result.lr2
eigen_stats = coint_test_result.lr1
# Print the test results
print("Johansen Cointegration Test Results (Testing for Zero Cointegrating Relationships):")
print(f"Trace Statistics: {coint_test_result.lr1}")
print(f"Critical Values: {coint_test_result.cvt}")
# Define stock pairs
stock_pairs = [('AAPL', 'AMZN'), ('MSFT', 'AAPL'), ('AMZN', 'MSFT')]
# Separate the output sections
print("\n" + "-" * 50 + "\n")
# Interpret the results for each pair
for i, (stock1, stock2) in enumerate(stock_pairs):
trace_statistic = trace_stats[i]
eigen_statistic = eigen_stats[i]
print(f"Pair {i + 1} ({stock1} and {stock2}):")
print(f"Trace Statistic: {trace_statistic}")
print(f"Eigen Statistic: {eigen_statistic}")
print("\n" + "-" * 50 + "\n")
# Determine cointegration based on critical values or other criteria
# Add your cointegration assessment logic here
print("Cointegration Assessment: Testing for Zero Cointegrating Relationships (Null Hypothesis)\n")Trace_eigen_stats.py hosted with ❤ by GitHub
Output:
[*********************100%%**********************] 3 of 3 completed Johansen Cointegration Test Results (Testing for Zero Cointegrating Relationships): Trace Statistics: [56.59350169 23.66248989 9.70197362] Critical Values: [[27.0669 29.7961 35.4628] [13.4294 15.4943 19.9349] [ 2.7055 3.8415 6.6349]] -------------------------------------------------- Pair 1 (AAPL and AMZN): Trace Statistic: 32.93101180471398 Eigen Statistic: 56.59350169362019 -------------------------------------------------- Pair 2 (MSFT and AAPL): Trace Statistic: 13.960516272813667 Eigen Statistic: 23.66248988890621 -------------------------------------------------- Pair 3 (AMZN and MSFT): Trace Statistic: 9.701973616092545 Eigen Statistic: 9.701973616092545 -------------------------------------------------- Cointegration Assessment: Testing for Zero Cointegrating Relationships (Null Hypothesis)
The output above shows the trace statistics and eigen statistics for each pair and then it shows trace statistics and critical values for conducting Johansen cointegration test.
Here, we will use “trace statistics and critical values” to find out if the null hypothesis is rejected or not. In other words, we will find out if the pair of stocks is cointegrated (rejection of null hypothesis) or not.
We are not considering eigen values here because they become relevant when you want to specify the exact number of cointegrating relationships.
For example, if we had specified that the null hypothesis will be rejected at maximum one cointegrating relationship or at maximum two cointegrating relationships etc., then eigen values would’ve been considered.
Let us now move further and see what we have observed from the output above.
The trace statistics and critical values are as follows:
Trace statistics: [56.59350509 23.66248457 9.70197525] Critical Values: Confidence Level 90%: [27.0669 29.7961 35.4628] Confidence Level 95%: [13.4294 15.4943 19.9349] Confidence Level 99%: [ 2.7055 3.8415 6.6349]
In the context of the Johansen Cointegration Test, the choice of which column of critical values to consider depends on the specific null hypothesis you are testing. The critical values are set up to test different hypotheses about the number of cointegrating relationships.
The three columns of critical values correspond to different null hypotheses:
It is clear that we are testing for zero cointegrating relationships (as we have taken), hence we should compare the trace statistics to the values in the first column of critical values.
For each confidence level, compare the trace statistics to the corresponding critical value.
Based on the provided statistics, it appears that the trace statistics are greater than the critical values at all three confidence levels. It suggests that the time series for each pair is cointegrated at all confidence levels.
Let us cross check this result with another method.
We will print each result by using the Johansen cointegration code:
coint_test_result = coint_johansen(data, det_order=0, k_ar_diff=1)
Below you can see the code for the same.
yf.pdr_override()
# Set the stock pairs
stock_pairs = [('AAPL', 'AMZN'), ('MSFT', 'AAPL'), ('AMZN', 'MSFT')]
# Set the start date and the end date
start_date = '2010-01-01'
end_date = '2022-10-16'
# Download stock price data for all pairs
data = pdr.get_data_yahoo([pair[0] for pair in stock_pairs] + [pair[1] for pair in stock_pairs], start=start_date, end=end_date)['Adj Close']
# Perform the Johansen Cointegration Test for all pairs
coint_test_result = coint_johansen(data, det_order=0, k_ar_diff=1)
# Extract the eigenvalues and critical values
tracevalues = coint_test_result.lr1
critical_values = coint_test_result.cvt
# Interpret the results for each pair
for i, (stock1, stock2) in enumerate(stock_pairs):
if (tracevalues[i] > critical_values[:, 1]).all():
print(f"Pair {i + 1} ({stock1} and {stock2}) is cointegrated.")
else:
print(f"Pair {i + 1} ({stock1} and {stock2}) is not cointegrated.")Pair_cointegration.py hosted with ❤ by GitHub
Output:
[*********************100%%**********************] 3 of 3 completed Pair 1 (AAPL and AMZN) is cointegrated. Pair 2 (MSFT and AAPL) is not cointegrated. Pair 3 (AMZN and MSFT) is not cointegrated.
Here are the tips for conducting successful Johansen cointegration analysis.
Originally posted on QuantInsti blog.
For specific platform feedback and suggestions, please submit it directly to our team using these instructions.
If you have an account-specific question or concern, please reach out to Client Services.
We encourage you to look through our FAQs before posting. Your question may already be covered!
Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.
This material is from QuantInsti and is being posted with its permission. The views expressed in this material are solely those of the author and/or QuantInsti and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.
Hi thanks for this. Don’t we need to have MSFT in the initial stock_ticker list to properly measure cointegation?