{"id":18503,"date":"2019-09-20T14:00:28","date_gmt":"2019-09-20T18:00:28","guid":{"rendered":"https:\/\/ibkrcampus.com\/?p=18503"},"modified":"2022-11-21T09:44:15","modified_gmt":"2022-11-21T14:44:15","slug":"k-means-clustering-algo-part-i","status":"publish","type":"post","link":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/","title":{"rendered":"K-Means Clustering Algorithm For Pair Selection In Python &#8211; Part I"},"content":{"rendered":"\n<p><strong>What Is K-Means Clustering?<\/strong><\/p>\n\n\n\n<p>K-Means Clustering is a type of unsupervised machine learning that groups data on the basis of similarities. In unsupervised machine learning, we only provide the model with features and then it &#8220;learns&#8221; the associations on its own.<\/p>\n\n\n\n<p>K-Means is one technique for finding subgroups within datasets. One difference in K-Means versus that of other clustering methods is that in K-Means, we have a predetermined amount of clusters.&nbsp;<\/p>\n\n\n\n<p>The algorithm begins by randomly assigning each data point to a specific cluster with no one data point being in any two clusters. It then calculates the centroid, or mean of these points.<\/p>\n\n\n\n<p>The object of the algorithm is to reduce the total within-cluster variation. In other words, we want to place each point into a specific cluster, measure the distances from the centroid of that cluster and then take the squared sum of these to get the total within-cluster variation. Our goal is to reduce this value.&nbsp;<\/p>\n\n\n\n<p>The process of assigning data points and calculating the squared distances is continued until there are no more changes in the components of the clusters, or in other words, we have optimally reduced the in-cluster variation.<\/p>\n\n\n\n<p>In the trading world, if you want to know the importance of K-Means, you don\u2019t have to look further than the implementation of statistical arbitrage.&nbsp;<\/p>\n\n\n\n<p><a href=\"https:\/\/blog.quantinsti.com\/statistical-arbitrage\/\">Statistical Arbitrage<\/a>&nbsp;is one of the most recognizable quantitative trading strategies. Though several variations exist, the basic premise is that despite two securities being random walks, their relationship is not random, thus yielding a trading opportunity. A key concern of implementing any version of statistical arbitrage is the process of pair selection.&nbsp;<\/p>\n\n\n\n<p><em>In the next installment, Lamarcus will try to implement statistical arbitrage without using K-Means first.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>K-Means Clustering is a type of unsupervised machine learning that groups data on the basis of similarities.<\/p>\n","protected":false},"author":261,"featured_media":18505,"comment_status":"closed","ping_status":"open","sticky":true,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[339,343,349,338,350,341,344],"tags":[851,806,4124,852,595],"contributors-categories":[13654],"class_list":{"0":"post-18503","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science","8":"category-programing-languages","9":"category-python-development","10":"category-ibkr-quant-news","11":"category-quant-asia-pacific","12":"category-quant-development","13":"category-quant-regions","14":"tag-algo-trading","15":"tag-data-science","16":"tag-k-means-clustering","17":"tag-machine-learning","18":"tag-python","19":"contributors-categories-quantinsti"},"pp_statuses_selecting_workflow":false,"pp_workflow_action":"current","pp_status_selection":"publish","acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v27.5) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>K-Means Clustering Algorithm For Pair Selection In Python &#8211; Part I<\/title>\n<meta name=\"description\" content=\"K-Means Clustering is a type of unsupervised machine learning that groups data on the basis of similarities.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/18503\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"K-Means Clustering Algorithm For Pair Selection In Python - Quant Blog\" \/>\n<meta property=\"og:description\" content=\"K-Means Clustering is a type of unsupervised machine learning that groups data on the basis of similarities.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/\" \/>\n<meta property=\"og:site_name\" content=\"IBKR Campus US\" \/>\n<meta property=\"article:published_time\" content=\"2019-09-20T18:00:28+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-21T14:44:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2019\/09\/K-Means-Clustering-Algo.png\" \/>\n\t<meta property=\"og:image:width\" content=\"657\" \/>\n\t<meta property=\"og:image:height\" content=\"400\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Lamarcus Coleman\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Lamarcus Coleman\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\n\t    \"@context\": \"https:\\\/\\\/schema.org\",\n\t    \"@graph\": [\n\t        {\n\t            \"@type\": \"NewsArticle\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/k-means-clustering-algo-part-i\\\/#article\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/k-means-clustering-algo-part-i\\\/\"\n\t            },\n\t            \"author\": {\n\t                \"name\": \"Lamarcus Coleman\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/d25f2d2d80f2efd41c14efd1767f840d\"\n\t            },\n\t            \"headline\": \"K-Means Clustering Algorithm For Pair Selection In Python &#8211; Part I\",\n\t            \"datePublished\": \"2019-09-20T18:00:28+00:00\",\n\t            \"dateModified\": \"2022-11-21T14:44:15+00:00\",\n\t            \"mainEntityOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/k-means-clustering-algo-part-i\\\/\"\n\t            },\n\t            \"wordCount\": 304,\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/k-means-clustering-algo-part-i\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2019\\\/09\\\/K-Means-Clustering-Algo.png\",\n\t            \"keywords\": [\n\t                \"Algo Trading\",\n\t                \"Data Science\",\n\t                \"K-Means Clustering\",\n\t                \"Machine Learning\",\n\t                \"Python\"\n\t            ],\n\t            \"articleSection\": [\n\t                \"Data Science\",\n\t                \"Programming Languages\",\n\t                \"Python Development\",\n\t                \"Quant\",\n\t                \"Quant Asia Pacific\",\n\t                \"Quant Development\",\n\t                \"Quant Regions\"\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"WebPage\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/k-means-clustering-algo-part-i\\\/\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/k-means-clustering-algo-part-i\\\/\",\n\t            \"name\": \"K-Means Clustering Algorithm For Pair Selection In Python - Quant Blog\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\"\n\t            },\n\t            \"primaryImageOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/k-means-clustering-algo-part-i\\\/#primaryimage\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/k-means-clustering-algo-part-i\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2019\\\/09\\\/K-Means-Clustering-Algo.png\",\n\t            \"datePublished\": \"2019-09-20T18:00:28+00:00\",\n\t            \"dateModified\": \"2022-11-21T14:44:15+00:00\",\n\t            \"description\": \"K-Means Clustering is a type of unsupervised machine learning that groups data on the basis of similarities.\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"ReadAction\",\n\t                    \"target\": [\n\t                        \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/k-means-clustering-algo-part-i\\\/\"\n\t                    ]\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"ImageObject\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/k-means-clustering-algo-part-i\\\/#primaryimage\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2019\\\/09\\\/K-Means-Clustering-Algo.png\",\n\t            \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2019\\\/09\\\/K-Means-Clustering-Algo.png\",\n\t            \"width\": 657,\n\t            \"height\": 400,\n\t            \"caption\": \"K-Means Clustering Algorithm For Pair Selection In Python - Part I\"\n\t        },\n\t        {\n\t            \"@type\": \"WebSite\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"name\": \"IBKR Campus US\",\n\t            \"description\": \"Financial Education from Interactive Brokers\",\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"SearchAction\",\n\t                    \"target\": {\n\t                        \"@type\": \"EntryPoint\",\n\t                        \"urlTemplate\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/?s={search_term_string}\"\n\t                    },\n\t                    \"query-input\": {\n\t                        \"@type\": \"PropertyValueSpecification\",\n\t                        \"valueRequired\": true,\n\t                        \"valueName\": \"search_term_string\"\n\t                    }\n\t                }\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"Organization\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\",\n\t            \"name\": \"Interactive Brokers\",\n\t            \"alternateName\": \"IBKR\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"logo\": {\n\t                \"@type\": \"ImageObject\",\n\t                \"inLanguage\": \"en-US\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\",\n\t                \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"width\": 669,\n\t                \"height\": 669,\n\t                \"caption\": \"Interactive Brokers\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\"\n\t            },\n\t            \"publishingPrinciples\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/about-ibkr-campus\\\/\",\n\t            \"ethicsPolicy\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/cyber-security-notice\\\/\"\n\t        },\n\t        {\n\t            \"@type\": \"Person\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/d25f2d2d80f2efd41c14efd1767f840d\",\n\t            \"name\": \"Lamarcus Coleman\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/author\\\/lamarcuscoleman\\\/\"\n\t        }\n\t    ]\n\t}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"K-Means Clustering Algorithm For Pair Selection In Python &#8211; Part I","description":"K-Means Clustering is a type of unsupervised machine learning that groups data on the basis of similarities.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/18503\/","og_locale":"en_US","og_type":"article","og_title":"K-Means Clustering Algorithm For Pair Selection In Python - Quant Blog","og_description":"K-Means Clustering is a type of unsupervised machine learning that groups data on the basis of similarities.","og_url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/","og_site_name":"IBKR Campus US","article_published_time":"2019-09-20T18:00:28+00:00","article_modified_time":"2022-11-21T14:44:15+00:00","og_image":[{"width":657,"height":400,"url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2019\/09\/K-Means-Clustering-Algo.png","type":"image\/png"}],"author":"Lamarcus Coleman","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Lamarcus Coleman","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/#article","isPartOf":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/"},"author":{"name":"Lamarcus Coleman","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/d25f2d2d80f2efd41c14efd1767f840d"},"headline":"K-Means Clustering Algorithm For Pair Selection In Python &#8211; Part I","datePublished":"2019-09-20T18:00:28+00:00","dateModified":"2022-11-21T14:44:15+00:00","mainEntityOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/"},"wordCount":304,"publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2019\/09\/K-Means-Clustering-Algo.png","keywords":["Algo Trading","Data Science","K-Means Clustering","Machine Learning","Python"],"articleSection":["Data Science","Programming Languages","Python Development","Quant","Quant Asia Pacific","Quant Development","Quant Regions"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/","url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/","name":"K-Means Clustering Algorithm For Pair Selection In Python - Quant Blog","isPartOf":{"@id":"https:\/\/ibkrcampus.com\/campus\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/#primaryimage"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2019\/09\/K-Means-Clustering-Algo.png","datePublished":"2019-09-20T18:00:28+00:00","dateModified":"2022-11-21T14:44:15+00:00","description":"K-Means Clustering is a type of unsupervised machine learning that groups data on the basis of similarities.","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/k-means-clustering-algo-part-i\/#primaryimage","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2019\/09\/K-Means-Clustering-Algo.png","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2019\/09\/K-Means-Clustering-Algo.png","width":657,"height":400,"caption":"K-Means Clustering Algorithm For Pair Selection In Python - Part I"},{"@type":"WebSite","@id":"https:\/\/ibkrcampus.com\/campus\/#website","url":"https:\/\/ibkrcampus.com\/campus\/","name":"IBKR Campus US","description":"Financial Education from Interactive Brokers","publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ibkrcampus.com\/campus\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ibkrcampus.com\/campus\/#organization","name":"Interactive Brokers","alternateName":"IBKR","url":"https:\/\/ibkrcampus.com\/campus\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","width":669,"height":669,"caption":"Interactive Brokers"},"image":{"@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/"},"publishingPrinciples":"https:\/\/www.interactivebrokers.com\/campus\/about-ibkr-campus\/","ethicsPolicy":"https:\/\/www.interactivebrokers.com\/campus\/cyber-security-notice\/"},{"@type":"Person","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/d25f2d2d80f2efd41c14efd1767f840d","name":"Lamarcus Coleman","url":"https:\/\/www.interactivebrokers.com\/campus\/author\/lamarcuscoleman\/"}]}},"jetpack_featured_media_url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2019\/09\/K-Means-Clustering-Algo.png","_links":{"self":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/18503","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/users\/261"}],"replies":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/comments?post=18503"}],"version-history":[{"count":0,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/18503\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media\/18505"}],"wp:attachment":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media?parent=18503"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/categories?post=18503"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/tags?post=18503"},{"taxonomy":"contributors-categories","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/contributors-categories?post=18503"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}