IBKR Quant Blog


K-Means Clustering For Pair Selection In Python - Historic Problem of Pair Selection

Means Clustering For Pair Selection In Python - Historic Problem of Pair Selection

To see the first 2 posts in this series, click Part I and Part 2


In this post, we will try to identify a tradable relationship in a brute force manner. How about Dollar Tree* and Dollar General*. They’re both discount retailers and look they both even have a dollar in their names. Since we’ve gotten the hang of things, we jump right into the ADF test.

Let’s first import the data for DLTR and DG.

#importing dltr and dg

dltr=pdr.get_data_yahoo('DLTR',start, end)

dg=pdr.get_data_yahoo('DG',start, end)

Now that we’ve gotten our data, let’s add these stocks to our newDF and create their spread.

#adding dltr and dg to our newDF dataframe




#creating the dltr and dg spread as a column in our newDF dataframe


We’ve now added the DLTR and DG stocks as well as their spread to our newDF dataframe. Let’s take a quick look at our dataframe.



Now that we have Spread_2 or the spread of DLTR and DG, we can create ADF2 or a second ADF test for these two stocks.

#Creating another adfuller instance


We’ve just run the ADF test on our DLTR and DG spread. We can now repeat our earlier logic to determine if the spread yields a tradable relationship.

if adf2[0] < adf2[4]['1%']:

               print('Spread is Cointegrated at 1% Significance Level')

elif adf2[0] < adf2[4]['5%']:

               print('Spread is Cointegrated at 5% Significance Level')

elif adf2[0] < adf2[4]['10%']:

               print('Spread is Cointegrated at 10% Significance Level')


               print('Spread is not Cointegrated')

Spread is not Cointegrated

To view the complete print out of the ADF2 test, we can call adf2.






{'1%': -3.4434437319767452,

'10%': -2.5698456884811351,

'5%': -2.8673146875484368},


To recap, in first and second post we began our journey toward understanding the efficacy of K-Means for pair selection and Statistical Arbitrage by attempting to develop a Statistical Arbitrage strategy in a world with no K-Means.

We learned that in a Statistical Arbitrage trading world without K-Means, we are left to our own devices for solving the historic problem of pair selection. We’ve learned that despite two stocks being related on a fundamental level, this doesn’t necessarily insinuate that they will provide a tradable relationship.

In subsequent posts in this series, we will get a better understanding of what K-Means and then prepare to apply it to our own Statistical Arbitrage strategy.




*Disclaimer: All investments and trading in the stock market involve risk. Any decisions to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

If you want to learn more about K-Means Clustering for Pair Selection in Python, or to download the code, visit QuantInsti website and the educational offerings at their Executive Programme in Algorithmic Trading (EPAT™).

This article is from QuantInsti and is being posted with QuantInsti’s permission. The views expressed in this article are solely those of the author and/or QuantInsti and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.



We appreciate your feedback. If you have any questions or comments about IBKR Quant Blog please contact ibkrquant@ibkr.com.

The material (including articles and commentary) provided on IBKR Quant Blog is offered for informational purposes only. The posted material is NOT a recommendation by Interactive Brokers (IB) that you or your clients should contract for the services of or invest with any of the independent advisors or hedge funds or others who may post on IBKR Quant Blog or invest with any advisors or hedge funds. The advisors, hedge funds and other analysts who may post on IBKR Quant Blog are independent of IB and IB does not make any representations or warranties concerning the past or future performance of these advisors, hedge funds and others or the accuracy of the information they provide. Interactive Brokers does not conduct a "suitability review" to make sure the trading of any advisor or hedge fund or other party is suitable for you.

Securities or other financial instruments mentioned in the material posted are not suitable for all investors. The material posted does not take into account your particular investment objectives, financial situations or needs and is not intended as a recommendation to you of any particular securities, financial instruments or strategies. Before making any investment or trade, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice. Past performance is no guarantee of future results.

Any information provided by third parties has been obtained from sources believed to be reliable and accurate; however, IB does not warrant its accuracy and assumes no responsibility for any errors or omissions.

Any information posted by employees of IB or an affiliated company is based upon information that is believed to be reliable. However, neither IB nor its affiliates warrant its completeness, accuracy or adequacy. IB does not make any representations or warranties concerning the past or future performance of any financial instrument. By posting material on IB Quant Blog, IB is not representing that any particular financial instrument or trading strategy is appropriate for you.