# IBKR Quant Blog

### K-Means Clustering For Pair Selection In Python - Heatmaps and ADF Tests

K-Means Clustering For Pair Selection In Python – Heatmaps and ADF Tests

In the previous post we created a dataframe to hold our Walmart* and Target* stock prices.

We are now ready to create a correlation heatmap of our stocks. To this, we will use python’s Seaborn library. Recall that we imported Seaborn earlier as sns.

#using seaborn as sns to create a correlation heatmap of WMT and TGT
sns.heatmap(newDF.corr())

In the above plot, we called the corr() method on our newDF and passed it into Seaborn’s heatmap object. From this visualization, we can see that our two stocks are not that correlated. Let’s create a final visualization to asses this relationship. We’ll use a scatter plot for this. Bottom of Form

Earlier we used Matplotlibs scatter plot method. So now we’ll introduce Seaborn’s scatter plot method. Note that Seaborn is built on top of Matplotlib and thus matplotlibs functionality can be applied to Seaborn.

#Creating a scatter plot using Seaborn
plt.figure(figsize=(15,10))
sns.jointplot(newDF['WMT'],newDF['TGT'])
plt.legend(loc=0)
plt.show()

One feature that I like about using Seaborn’s scatter plot is that it provides the Correlation Coefficient and P-Value. From looking at this pearsonr value, we can see that WMT and TGT were not positively correlated over the period. Now that we have a better understanding of our two stocks, let’s check to see if a tradable relationship exists.

We’ll use the Augmented Dickey Fuller Test to determine of our stocks can be traded within a Statistical Arbitrage Strategy. Recall that we imported the adfuller test from the statsmodels.tsa.api package earlier.

To perform the ADF test, we must first create the spread of our stocks. We add this to our existing newDF dataframe.

We have now performed the ADF test on our spread and need to determine whether or not our stocks are cointegrated. Let’s write some logic to determine the results of our test.

#Logic that states if our test statistic is less than
#a specific critical value, then the pair is cointegrated at that
#level, else the pair is not cointegrated

print('Spread is Cointegrated at 1% Significance Level')

print('Spread is Cointegrated at 5% Significance Level')

print('Spread is Cointegrated at 10% Significance Level')

else:

The results of the Augmented Dickey Fuller Test showed that Walmart and Target were not cointegrated. This is determined by a test statistic that is not less than one of the critical values. If you would like to view the actual print out of the ADF test you can do so by keying ADF. In the above example, we use indexing to decipher between the t-statistic and critical values. The statsmodels ADF Test provides you with other useful information such as the p-value. You can learn more about the ADF test here

#printing out the results of the adf test

(-0.38706825965317432,

0.91223562790079438,

0,

503,

{'1%': -3.4434175660489905,

'10%': -2.5698395516760275,

'5%': -2.8673031724657454},

1190.4266834075452)

In the next post we will look at historic problem of pair selection.

------------------------------------------------------------

*Disclaimer: All investments and trading in the stock market involve risk. Any decisions to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

If you want to learn more about K-Means Clustering for Pair Selection in Python, or to download the code, visit QuantInsti website and the educational offerings at their Executive Programme in Algorithmic Trading (EPAT™).

17881

###### Disclosures

The material (including articles and commentary) provided on IBKR Quant Blog is offered for informational purposes only. The posted material is NOT a recommendation by Interactive Brokers (IB) that you or your clients should contract for the services of or invest with any of the independent advisors or hedge funds or others who may post on IBKR Quant Blog or invest with any advisors or hedge funds. The advisors, hedge funds and other analysts who may post on IBKR Quant Blog are independent of IB and IB does not make any representations or warranties concerning the past or future performance of these advisors, hedge funds and others or the accuracy of the information they provide. Interactive Brokers does not conduct a "suitability review" to make sure the trading of any advisor or hedge fund or other party is suitable for you.

Securities or other financial instruments mentioned in the material posted are not suitable for all investors. The material posted does not take into account your particular investment objectives, financial situations or needs and is not intended as a recommendation to you of any particular securities, financial instruments or strategies. Before making any investment or trade, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice. Past performance is no guarantee of future results.

Any information provided by third parties has been obtained from sources believed to be reliable and accurate; however, IB does not warrant its accuracy and assumes no responsibility for any errors or omissions.

Any information posted by employees of IB or an affiliated company is based upon information that is believed to be reliable. However, neither IB nor its affiliates warrant its completeness, accuracy or adequacy. IB does not make any representations or warranties concerning the past or future performance of any financial instrument. By posting material on IB Quant Blog, IB is not representing that any particular financial instrument or trading strategy is appropriate for you.