# IBKR Quant Blog

### K-Means Clustering For Pair Selection In Python - Historic Problem of Pair Selection

Means Clustering For Pair Selection In Python - Historic Problem of Pair Selection

To see the first 2 posts in this series, click Part I and Part 2

In this post, we will try to identify a tradable relationship in a brute force manner. How about Dollar Tree* and Dollar General*. They’re both discount retailers and look they both even have a dollar in their names. Since we’ve gotten the hang of things, we jump right into the ADF test.

Let’s first import the data for DLTR and DG.

#importing dltr and dg

dltr=pdr.get_data_yahoo('DLTR',start, end)

dg=pdr.get_data_yahoo('DG',start, end)

Now that we’ve gotten our data, let’s add these stocks to our newDF and create their spread.

#adding dltr and dg to our newDF dataframe

newDF['DLTR']=dltr['Close']

newDF['DG']=dg['Close']

#creating the dltr and dg spread as a column in our newDF dataframe

We’ve now added the DLTR and DG stocks as well as their spread to our newDF dataframe. Let’s take a quick look at our dataframe.

Now that we have Spread_2 or the spread of DLTR and DG, we can create ADF2 or a second ADF test for these two stocks.

We’ve just run the ADF test on our DLTR and DG spread. We can now repeat our earlier logic to determine if the spread yields a tradable relationship.

print('Spread is Cointegrated at 1% Significance Level')

print('Spread is Cointegrated at 5% Significance Level')

print('Spread is Cointegrated at 10% Significance Level')

else:

To view the complete print out of the ADF2 test, we can call adf2.

(-1.9620694402101162,

0.30344784824995258,

1,

502,

{'1%': -3.4434437319767452,

'10%': -2.5698456884811351,

'5%': -2.8673146875484368},

1305.4559226426163)

To recap, in first and second post we began our journey toward understanding the efficacy of K-Means for pair selection and Statistical Arbitrage by attempting to develop a Statistical Arbitrage strategy in a world with no K-Means.

We learned that in a Statistical Arbitrage trading world without K-Means, we are left to our own devices for solving the historic problem of pair selection. We’ve learned that despite two stocks being related on a fundamental level, this doesn’t necessarily insinuate that they will provide a tradable relationship.

In subsequent posts in this series, we will get a better understanding of what K-Means and then prepare to apply it to our own Statistical Arbitrage strategy.

------------------------------------------------------------

*Disclaimer: All investments and trading in the stock market involve risk. Any decisions to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

If you want to learn more about K-Means Clustering for Pair Selection in Python, or to download the code, visit QuantInsti website and the educational offerings at their Executive Programme in Algorithmic Trading (EPAT™).

17882