{"id":200437,"date":"2023-12-21T10:20:11","date_gmt":"2023-12-21T15:20:11","guid":{"rendered":"https:\/\/ibkrcampus.com\/?p=200437"},"modified":"2023-12-21T10:20:59","modified_gmt":"2023-12-21T15:20:59","slug":"vectorize-fuzzy-matching","status":"publish","type":"post","link":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/","title":{"rendered":"Vectorize Fuzzy Matching"},"content":{"rendered":"\n<p>One of the best things about R is its ability to vectorize code. This allows you to run code much faster than you would if you were using a&nbsp;<em>for<\/em>&nbsp;or&nbsp;<em>while<\/em>&nbsp;loop. In this post, we\u2019re going to show you how to use&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Array_programming\">vectorization<\/a>&nbsp;to speed up fuzzy matching. First, a little bit of background will be covered. If you\u2019re familiar with vectorization and \/ or fuzzy matching, feel free to skip further down the post.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-is-vectorization\"><strong>What is vectorization?<\/strong><\/h2>\n\n\n\n<p>Vectorization works by performing operations on entire vectors, or by extension, matrices, rather than iterating through each element in a collection of objects one at a time. A basic example is adding two vectors together. This can be done like this:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">a &lt;- c(3, 4, 5)\nb &lt;- c(6, 7, 8)\n \nsums &lt;- a + b<\/pre>\n\n\n\n<p>In this example,&nbsp;<em>sums<\/em>&nbsp;now equals c(9, 11, 13). This is because the addition operator, \u201c+\u201d, was applied pairwise to each vector i.e. the first element of a was added to the first element of b, the second element of a was added to the second element of b, and so on. Vectorization is faster than traditional for loops because it uses parallel operations under the hood. For loops in R, on the other hand, are notoriously slow.<\/p>\n\n\n\n<p>This example may seem simple, but vectorization can be used much more powerfully to speed up a process like fuzzy matching, the topic of this article.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What is fuzzy matching?<\/strong><\/h2>\n\n\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Approximate_string_matching\">Fuzzy matching<\/a>&nbsp;is the process of finding strings that follow similar patterns. For example, suppose you\u2019re trying to join two data sets together on a city field. Data for these cities may be entered into a system manually, allowing for spelling or formatting differences i.e. \u201cMt. Hood\u201d might be coded as \u201cMount Hood\u201d etc. For a human, we can see those are clearly the same, but we need an algorithmic way to determine that so we don\u2019t have to manually go through large numbers of possibilities.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The stringdist package<\/strong><\/h2>\n\n\n\n<p>To perform fuzzy matching, we\u2019re going to use a package called&nbsp;<a href=\"https:\/\/cran.r-project.org\/web\/packages\/stringdist\/index.html\">stringdist<\/a>. This contains a function we need called&nbsp;<em>stringsim<\/em>&nbsp;which gives a measure of similarity between a pair of strings. This function allows several different algorithms to compare the similarity between two strings, and returns a value between 0 (very dissimilar) and 1 (very similar or equal). For instance:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># load stringdist package\nrequire(stringdist)\n \n# compare two sample strings\nstringsim(\"this is a test\", \"this is the test\", method = \"jw\")<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Simplistic Approach<\/strong><\/h2>\n\n\n\n<p>Alright, so to illustrate how much time vectorization saves, let\u2019s do fuzzy matching in a less clever way. We\u2019re going to use the&nbsp;<em>zipcode<\/em>&nbsp;package to get a sample list of cities across the US. These get stored in the vector,&nbsp;<em>cities<\/em>. Note, since this data set is actually at a zip code level, we\u2019re going to have some duplicates in our vector, but that\u2019s not a concern for this example because we\u2019re trying to demonstrate performance differences between vectorization versus non-vectorization. Thus, our&nbsp;<em>cities<\/em>&nbsp;vector contains 44,336 elements.<\/p>\n\n\n\n<p>Also, we create a vector below, called&nbsp;<em>input<\/em>, containing the misspelled or adjusted spellings of several cities in Florida (\u201cMt. Dora\u201d vs. \u201cMount Dora\u201d, \u201cSun City, FL\u201d vs. \u201cSun City\u201d etc.).<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># load packages\nrequire(stringdist)\nrequire(zipcode)\ndata(\"zipcode\")\n \ncities &lt;- zipcode$city\n \n# misspell \/ change spellings from a few Florida cities\ninput &lt;- c(\"Centry\", \"Boca R.\", \"Mt. Dora\", \"winterhaven\", \"Sun City, FL\")<\/pre>\n\n\n\n<p>Now, one way of doing fuzzy matching here would be to loop through each city in our&nbsp;<em>input<\/em>&nbsp;vector, and then loop through each city in the&nbsp;<em>cities<\/em>&nbsp;vector and check the string similarity between each possible pair of strings i.e. check the similarity between the first element in&nbsp;<em>input<\/em>&nbsp;against every single element in&nbsp;<em>cities<\/em>. Since the length of&nbsp;<em>input<\/em>&nbsp;is 5 and the length of&nbsp;<em>cities<\/em>&nbsp;is 44,336, this requires 5 * 44,336 = 221,680 calls to&nbsp;<em>stringsim<\/em>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># Naive approach\nstart &lt;- proc.time()\nresults &lt;- c()\nfor(city in input)\n{\n   \n    best_dist &lt;- -Inf\n    for(check in cities)\n    {\n        dist &lt;- stringsim(city, check, method = \"jw\")\n        if(dist &gt; best_dist)\n        {\n            closest_match &lt;- check\n            best_dist &lt;- dist\n           \n        }\n       \n    }\n     \n    results &lt;- append(results, closest_match)\n   \n}\nend &lt;- proc.time()<\/pre>\n\n\n\n<p>If we run end \u2013 start, we find that this process takes just under 26 seconds. However, let\u2019s see what happens when we ditch the for loops, and use vectorization instead.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Vectorizing Fuzzy Matching<\/strong><\/h2>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># define function to search for best string match in cities\nget_best_match &lt;- function(city, cities = cities)\n{\n     \n    max_index &lt;- which.max(stringsim(city, cities, method = \"jw\"))\n     \n    return(cities[max_index])\n   \n}\n \n \nvector_start &lt;- proc.time()\nvector_results &lt;- sapply(input, function(city) get_best_match(city, cities))\nvector_end &lt;- proc.time()<\/pre>\n\n\n\n<p>If we run vector_end \u2013 vector_start, we see this only takes 0.03 seconds! The main reason for the speed up can be seen in this line:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">max_index &lt;- which.max(stringsim(city, cities, method = \"jw\"))<\/pre>\n\n\n\n<p>The inner function,&nbsp;<em>stringsim<\/em>, gets the string similarity between the input city (e.g. \u201cMt. Dora\u201d) versus every element in the cities vector. This is done in one line, rather than using a for or while loop construct. The output of this function call is a vector that shows the similarity measures (between 0 and 1) of the input city versus everything in&nbsp;<em>cities<\/em>.<\/p>\n\n\n\n<p>We then apply the&nbsp;<em>which.max<\/em>&nbsp;function to get the index of the element in cities that is the closest match to the input city. Using this index, we can get the actual name of the city with the closest match, like below.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">return(cities[max_index])<\/pre>\n\n\n\n<p>The other reason for the performance increase is using&nbsp;<em>sapply<\/em>&nbsp;to actually loop over the elements in our input vector, passing element in turn to the&nbsp;<em>get_best_match<\/em>&nbsp;function. Since&nbsp;<em>sapply<\/em>&nbsp;is written in C under the hood, it usually runs much faster than a traditional for loop in R.<\/p>\n\n\n\n<p>That\u2019s it for this post! Check out other posts of mine here:&nbsp;<a href=\"https:\/\/theautomatic.net\/blog\/\">https:\/\/theautomatic.net\/blog\/<\/a>.<\/p>\n\n\n\n<p><em>Originally posted on <a href=\"https:\/\/theautomatic.net\/2017\/12\/11\/vectorize-fuzzy-matching\/\">TheAutomatic.net<\/a> blog.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Vectorization works by performing operations on entire vectors, or by extension, matrices, rather than iterating through each element in a collection of objects one at a time.<\/p>\n","protected":false},"author":388,"featured_media":67374,"comment_status":"open","ping_status":"closed","sticky":true,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[339,343,338,342],"tags":[806,14805,487,6591,16479,15303],"contributors-categories":[13695],"class_list":{"0":"post-200437","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science","8":"category-programing-languages","9":"category-ibkr-quant-news","10":"category-r-development","11":"tag-data-science","12":"tag-fuzzy-matching","13":"tag-r","14":"tag-rstats","15":"tag-stringdist-package","16":"tag-vectorization","17":"contributors-categories-theautomatic-net"},"pp_statuses_selecting_workflow":false,"pp_workflow_action":"current","pp_status_selection":"publish","acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Vectorize Fuzzy Matching | IBKR Quant<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/200437\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Vectorize Fuzzy Matching\" \/>\n<meta property=\"og:description\" content=\"Vectorization works by performing operations on entire vectors, or by extension, matrices, rather than iterating through each element in a collection of objects one at a time.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/\" \/>\n<meta property=\"og:site_name\" content=\"IBKR Campus US\" \/>\n<meta property=\"article:published_time\" content=\"2023-12-21T15:20:11+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-12-21T15:20:59+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/11\/binary-background-abstract.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"900\" \/>\n\t<meta property=\"og:image:height\" content=\"550\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Andrew Treadway\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrew Treadway\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\n\t    \"@context\": \"https:\\\/\\\/schema.org\",\n\t    \"@graph\": [\n\t        {\n\t            \"@type\": \"NewsArticle\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/vectorize-fuzzy-matching\\\/#article\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/vectorize-fuzzy-matching\\\/\"\n\t            },\n\t            \"author\": {\n\t                \"name\": \"Andrew Treadway\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/d4018570a16fb867f1c08412fc9c64bc\"\n\t            },\n\t            \"headline\": \"Vectorize Fuzzy Matching\",\n\t            \"datePublished\": \"2023-12-21T15:20:11+00:00\",\n\t            \"dateModified\": \"2023-12-21T15:20:59+00:00\",\n\t            \"mainEntityOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/vectorize-fuzzy-matching\\\/\"\n\t            },\n\t            \"wordCount\": 861,\n\t            \"commentCount\": 0,\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/vectorize-fuzzy-matching\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/11\\\/binary-background-abstract.jpg\",\n\t            \"keywords\": [\n\t                \"Data Science\",\n\t                \"Fuzzy Matching\",\n\t                \"R\",\n\t                \"rstats\",\n\t                \"Stringdist Package\",\n\t                \"Vectorization\"\n\t            ],\n\t            \"articleSection\": [\n\t                \"Data Science\",\n\t                \"Programming Languages\",\n\t                \"Quant\",\n\t                \"R Development\"\n\t            ],\n\t            \"inLanguage\": \"en-US\",\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"CommentAction\",\n\t                    \"name\": \"Comment\",\n\t                    \"target\": [\n\t                        \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/vectorize-fuzzy-matching\\\/#respond\"\n\t                    ]\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"WebPage\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/vectorize-fuzzy-matching\\\/\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/vectorize-fuzzy-matching\\\/\",\n\t            \"name\": \"Vectorize Fuzzy Matching | IBKR Campus US\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\"\n\t            },\n\t            \"primaryImageOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/vectorize-fuzzy-matching\\\/#primaryimage\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/vectorize-fuzzy-matching\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/11\\\/binary-background-abstract.jpg\",\n\t            \"datePublished\": \"2023-12-21T15:20:11+00:00\",\n\t            \"dateModified\": \"2023-12-21T15:20:59+00:00\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"ReadAction\",\n\t                    \"target\": [\n\t                        \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/vectorize-fuzzy-matching\\\/\"\n\t                    ]\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"ImageObject\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/vectorize-fuzzy-matching\\\/#primaryimage\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/11\\\/binary-background-abstract.jpg\",\n\t            \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/11\\\/binary-background-abstract.jpg\",\n\t            \"width\": 900,\n\t            \"height\": 550,\n\t            \"caption\": \"Quant\"\n\t        },\n\t        {\n\t            \"@type\": \"WebSite\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"name\": \"IBKR Campus US\",\n\t            \"description\": \"Financial Education from Interactive Brokers\",\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"SearchAction\",\n\t                    \"target\": {\n\t                        \"@type\": \"EntryPoint\",\n\t                        \"urlTemplate\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/?s={search_term_string}\"\n\t                    },\n\t                    \"query-input\": {\n\t                        \"@type\": \"PropertyValueSpecification\",\n\t                        \"valueRequired\": true,\n\t                        \"valueName\": \"search_term_string\"\n\t                    }\n\t                }\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"Organization\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\",\n\t            \"name\": \"Interactive Brokers\",\n\t            \"alternateName\": \"IBKR\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"logo\": {\n\t                \"@type\": \"ImageObject\",\n\t                \"inLanguage\": \"en-US\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\",\n\t                \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"width\": 669,\n\t                \"height\": 669,\n\t                \"caption\": \"Interactive Brokers\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\"\n\t            },\n\t            \"publishingPrinciples\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/about-ibkr-campus\\\/\",\n\t            \"ethicsPolicy\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/cyber-security-notice\\\/\"\n\t        },\n\t        {\n\t            \"@type\": \"Person\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/d4018570a16fb867f1c08412fc9c64bc\",\n\t            \"name\": \"Andrew Treadway\",\n\t            \"description\": \"Andrew Treadway currently works as a Senior Data Scientist, and has experience doing analytics, software automation, and ETL. He completed a master\u2019s degree in computer science \\\/ machine learning, and an undergraduate degree in pure mathematics. Connect with him on LinkedIn: https:\\\/\\\/www.linkedin.com\\\/in\\\/andrew-treadway-a3b19b103\\\/In addition to TheAutomatic.net blog, he also teaches in-person courses on Python and R through my NYC meetup: more details.\",\n\t            \"sameAs\": [\n\t                \"https:\\\/\\\/theautomatic.net\\\/about-me\\\/\"\n\t            ],\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/author\\\/andrewtreadway\\\/\"\n\t        }\n\t    ]\n\t}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Vectorize Fuzzy Matching | IBKR Quant","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/200437\/","og_locale":"en_US","og_type":"article","og_title":"Vectorize Fuzzy Matching","og_description":"Vectorization works by performing operations on entire vectors, or by extension, matrices, rather than iterating through each element in a collection of objects one at a time.","og_url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/","og_site_name":"IBKR Campus US","article_published_time":"2023-12-21T15:20:11+00:00","article_modified_time":"2023-12-21T15:20:59+00:00","og_image":[{"width":900,"height":550,"url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/11\/binary-background-abstract.jpg","type":"image\/jpeg"}],"author":"Andrew Treadway","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Andrew Treadway","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/#article","isPartOf":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/"},"author":{"name":"Andrew Treadway","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/d4018570a16fb867f1c08412fc9c64bc"},"headline":"Vectorize Fuzzy Matching","datePublished":"2023-12-21T15:20:11+00:00","dateModified":"2023-12-21T15:20:59+00:00","mainEntityOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/"},"wordCount":861,"commentCount":0,"publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/11\/binary-background-abstract.jpg","keywords":["Data Science","Fuzzy Matching","R","rstats","Stringdist Package","Vectorization"],"articleSection":["Data Science","Programming Languages","Quant","R Development"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/","url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/","name":"Vectorize Fuzzy Matching | IBKR Campus US","isPartOf":{"@id":"https:\/\/ibkrcampus.com\/campus\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/#primaryimage"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/11\/binary-background-abstract.jpg","datePublished":"2023-12-21T15:20:11+00:00","dateModified":"2023-12-21T15:20:59+00:00","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/vectorize-fuzzy-matching\/#primaryimage","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/11\/binary-background-abstract.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/11\/binary-background-abstract.jpg","width":900,"height":550,"caption":"Quant"},{"@type":"WebSite","@id":"https:\/\/ibkrcampus.com\/campus\/#website","url":"https:\/\/ibkrcampus.com\/campus\/","name":"IBKR Campus US","description":"Financial Education from Interactive Brokers","publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ibkrcampus.com\/campus\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ibkrcampus.com\/campus\/#organization","name":"Interactive Brokers","alternateName":"IBKR","url":"https:\/\/ibkrcampus.com\/campus\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","width":669,"height":669,"caption":"Interactive Brokers"},"image":{"@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/"},"publishingPrinciples":"https:\/\/www.interactivebrokers.com\/campus\/about-ibkr-campus\/","ethicsPolicy":"https:\/\/www.interactivebrokers.com\/campus\/cyber-security-notice\/"},{"@type":"Person","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/d4018570a16fb867f1c08412fc9c64bc","name":"Andrew Treadway","description":"Andrew Treadway currently works as a Senior Data Scientist, and has experience doing analytics, software automation, and ETL. He completed a master\u2019s degree in computer science \/ machine learning, and an undergraduate degree in pure mathematics. Connect with him on LinkedIn: https:\/\/www.linkedin.com\/in\/andrew-treadway-a3b19b103\/In addition to TheAutomatic.net blog, he also teaches in-person courses on Python and R through my NYC meetup: more details.","sameAs":["https:\/\/theautomatic.net\/about-me\/"],"url":"https:\/\/www.interactivebrokers.com\/campus\/author\/andrewtreadway\/"}]}},"jetpack_featured_media_url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/11\/binary-background-abstract.jpg","_links":{"self":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/200437","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/users\/388"}],"replies":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/comments?post=200437"}],"version-history":[{"count":0,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/200437\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media\/67374"}],"wp:attachment":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media?parent=200437"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/categories?post=200437"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/tags?post=200437"},{"taxonomy":"contributors-categories","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/contributors-categories?post=200437"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}