{"id":185717,"date":"2023-02-28T09:46:26","date_gmt":"2023-02-28T14:46:26","guid":{"rendered":"https:\/\/ibkrcampus.com\/?p=185717"},"modified":"2023-02-28T09:46:47","modified_gmt":"2023-02-28T14:46:47","slug":"guide-to-fuzzy-matching-with-python","status":"publish","type":"post","link":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/","title":{"rendered":"Guide to Fuzzy Matching with Python"},"content":{"rendered":"\n<p>This post is going to delve into the&nbsp;<strong>textdistance<\/strong>&nbsp;package in Python, which provides a large collection of algorithms to do&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Approximate_string_matching\">fuzzy matching<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-textdistance-package\"><strong>The textdistance package<\/strong><\/h2>\n\n\n\n<p><strong><\/strong><\/p>\n\n\n\n<p>Similar to the&nbsp;<a href=\"https:\/\/theautomatic.net\/2017\/12\/11\/vectorize-fuzzy-matching\/\">stringdist<\/a>&nbsp;package in R, the&nbsp;<strong>textdistance<\/strong>&nbsp;package provides a collection of algorithms that can be used for fuzzy matching. To install&nbsp;<strong>textdistance<\/strong>&nbsp;using just the pure Python implementations of the algorithms, you can use pip like below:<\/p>\n\n\n\n<pre class=\"wp-block-syntaxhighlighter-code\">pip install textdistance<\/pre>\n\n\n\n<p>However, if you want to get the best possible speed out of the algorithms, you can tweak the pip install command like this:<\/p>\n\n\n\n<pre class=\"wp-block-syntaxhighlighter-code\">pip install textdistance[extras]<\/pre>\n\n\n\n<p>Once installed, we can import&nbsp;<strong>textdistance<\/strong>&nbsp;like below:<\/p>\n\n\n\n<pre class=\"wp-block-syntaxhighlighter-code\">import textdistance<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Levenshtein distance<\/strong><\/h3>\n\n\n\n<p>Levenshtein distance measures the minimum number of insertions, deletions, and substitutions required to change one string into another. This can be a useful measure to use if you think that the differences between two strings are equally likely to occur at any point in the strings. It\u2019s also more useful if you do&nbsp;<strong>not<\/strong>&nbsp;suspect full words in the strings are rearranged from each other (see Jaccard similarity or cosine similarity a little further down).<\/p>\n\n\n\n<pre class=\"wp-block-syntaxhighlighter-code\">textdistance.levenshtein(\"this test\", \"that test\") # 2\n \ntextdistance.levenshtein(\"test this\", \"this test\") # 6<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Jaro-Winkler<\/strong><\/h3>\n\n\n\n<p>Jaro-Winkler is another similarity measure between two strings. This algorithm penalizes differences in strings more earlier in the string. A motivational idea behind using this algorithm is that typos are generally more likely to occur later in the string, rather than at the beginning. When comparing \u201cthis test\u201d vs. \u201ctest this\u201d, even though the strings contain the exact same words (just in different order), the similarity score is just 2\/3. If it matters more that the beginning of two strings in your case are the same, then this could be a useful algorithm to try.<\/p>\n\n\n\n<pre class=\"wp-block-syntaxhighlighter-code\">textdistance.jaro_winkler(\"this test\", \"test this\") # .666666666...<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Jaccard Similarity<\/strong><\/h3>\n\n\n\n<p>Jaccard similarity measures the shared characters between two strings, regardless of order. In the first example below, we see the first string, \u201cthis test\u201d, has nine characters (including the space). The second string, \u201cthat test\u201d, has an additional two characters that the first string does not (the \u201cat\u201d in \u201cthat\u201d). This measure takes the number of shared characters (seven) divided by this total number of characters (9 + 2 = 11). Thus, 7 \/ 11 = .636363636363\u2026<\/p>\n\n\n\n<p>In the second example, the strings contain exactly the same characters, just in a different order. Thus, since order doesn\u2019t matter, their Jaccard similarity is a perfect 1.0.<\/p>\n\n\n\n<pre class=\"wp-block-syntaxhighlighter-code\">textdistance.jaccard(\"this test\", \"that test\")\n \ntextdistance.jaccard(\"this test\", \"test this\")<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"616\" height=\"141\" data-src=\"\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-jaccard-similarity-the-automatic-net.jpg\" alt=\"python jaccard similarity the automatic net\" class=\"wp-image-185722 lazyload\" data-srcset=\"https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-jaccard-similarity-the-automatic-net.jpg 616w, https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-jaccard-similarity-the-automatic-net-300x69.jpg 300w\" data-sizes=\"(max-width: 616px) 100vw, 616px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 616px; aspect-ratio: 616\/141;\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Cosine similarity<\/strong><\/h3>\n\n\n\n<p>Cosine similarity is a common way of comparing two strings. This algorithm treats strings as vectors, and calculates the cosine between them. Similar to Jaccard Similarity from above, cosine similarity also disregards order in the strings being compared.<\/p>\n\n\n\n<p>For example, here we compare the word \u201capple\u201d with a rearranged anagram of itself. This gives us a perfect cosine similarity score.<\/p>\n\n\n\n<pre class=\"wp-block-syntaxhighlighter-code\">textdistance.cosine(\"apple\", \"ppale\") # 1.0<\/pre>\n\n\n\n<p>On the other hand, we get the below result when comparing our two example strings that have slightly different characters. Since the calculation behind cosine similarity differs a bit from Jaccard Similarity, the results we get when using each algorithm on two strings that are not anagrams of each other will be different i.e. we\u2019ll get the same perfect result from each algorithm when comparing two strings that are just rearranged variations of each other, but for other cases, the algorithms will generally return different numeric results.<\/p>\n\n\n\n<pre class=\"wp-block-syntaxhighlighter-code\">textdistance.cosine(\"this test\", \"that test\")<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"594\" height=\"66\" data-src=\"\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-cosine-similarity-the-automatic-net.jpg\" alt=\"\" class=\"wp-image-185723 lazyload\" data-srcset=\"https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-cosine-similarity-the-automatic-net.jpg 594w, https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-cosine-similarity-the-automatic-net-300x33.jpg 300w\" data-sizes=\"(max-width: 594px) 100vw, 594px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 594px; aspect-ratio: 594\/66;\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Needleman-Wunsch<\/strong><\/h3>\n\n\n\n<p>Needleman-Wunsch is often used in bioinformatics to&nbsp;<a href=\"https:\/\/theautomatic.net\/2018\/11\/28\/how-to-measure-dna-similarity-with-python-and-dynamic-programming\/\">measure similarity between DNA sequences<\/a>. In effect, it tries to adjust one string (e.g. a string representing DNA) to line up with another string (e.g. of DNA). This algorithm has a parameter called \u201cgap cost\u201d, which can be adjusted like below. For more information,&nbsp;<a href=\"https:\/\/theautomatic.net\/2018\/11\/28\/how-to-measure-dna-similarity-with-python-and-dynamic-programming\/\">see this previous post<\/a>.<\/p>\n\n\n\n<pre class=\"wp-block-syntaxhighlighter-code\">textdistance.needleman_wunsch(\"AAAGGT\", \"ATACGGA\")\n \n# adjust the gap cost\ntextdistance.needleman_wunsch.gap_cost = 3\n \ntextdistance.needleman_wunsch(\"AAAGGT\", \"ATACGGA\")<\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"640\" height=\"267\" data-src=\"\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/needleman-wunsch-python-the-automatic-net.jpg\" alt=\"\" class=\"wp-image-185724 lazyload\" data-srcset=\"https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/needleman-wunsch-python-the-automatic-net.jpg 640w, https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/needleman-wunsch-python-the-automatic-net-300x125.jpg 300w\" data-sizes=\"(max-width: 640px) 100vw, 640px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 640px; aspect-ratio: 640\/267;\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>MRA (Match Rating Approach)<\/strong><\/h3>\n\n\n\n<p>The MRA (Match Rating Approach) algorithm is a type of phonetic matching algorithm i.e. it attempts to measure the similarity between two strings based upon their sounds. This algorithm could be useful if you\u2019re handling common misspellings (without much loss in pronunciation), or words that sound the same but are spelled differently (homophones). It was originally developed to compare similar-sounding names. For example, below we compare \u201ctie\u201d and \u201ctye\u201d. The score that gets returned needs to be compared to a mapping table based upon the length of the strings involved (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Match_rating_approach\">see this link for more detailed information<\/a>).<\/p>\n\n\n\n<pre class=\"wp-block-syntaxhighlighter-code\">textdistance.mra(\"tie\", \"tye\") # 1<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>More on textdistance<\/strong><\/h2>\n\n\n\n<p><strong><\/strong><\/p>\n\n\n\n<p>As a default for certain algorithms,&nbsp;<strong>textdistance<\/strong>&nbsp;(when installed with extras) will try to find external libraries when a function is called. The purpose behind this is try get the implementation with optimal speed. This can be turned off like below.<\/p>\n\n\n\n<pre class=\"wp-block-syntaxhighlighter-code\">textdistance.jaro_winkler.external = False\n \ntextdistance.jaro_winkler(\"second test\", \"2nd test\")<\/pre>\n\n\n\n<p>That\u2019s it for this post! In conclusion, it\u2019s important to assess your use case when doing fuzzy matching since there\u2019s quite a few algorithms out there. It can be useful to experiment with a few of them for your problem to test out which one works best.<\/p>\n\n\n\n<p>Hope you enjoyed reading a guide to fuzzy matching with Python!&nbsp;<\/p>\n\n\n\n<p><em>Originally posted on <a href=\"https:\/\/theautomatic.net\/2019\/11\/13\/guide-to-fuzzy-matching-with-python\/\">TheAutomatic.net<\/a> Blog.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post is going to delve into the textdistance package in Python, which provides a large collection of algorithms to do fuzzy matching.<\/p>\n","protected":false},"author":388,"featured_media":185725,"comment_status":"closed","ping_status":"open","sticky":true,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[339,343,349,338,341,352,344],"tags":[14810,14805,14809,14808,14807,14812,14811,595,14806],"contributors-categories":[13695],"class_list":{"0":"post-185717","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science","8":"category-programing-languages","9":"category-python-development","10":"category-ibkr-quant-news","11":"category-quant-development","12":"category-quant-north-america","13":"category-quant-regions","14":"tag-cosine-similarity","15":"tag-fuzzy-matching","16":"tag-jaccard-similarity","17":"tag-jaro-winkler","18":"tag-levenshtein-distance","19":"tag-mra-match-rating-approach","20":"tag-needleman-wunsch","21":"tag-python","22":"tag-textdistance-package","23":"contributors-categories-theautomatic-net"},"pp_statuses_selecting_workflow":false,"pp_workflow_action":"current","pp_status_selection":"publish","acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v27.4) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Guide to Fuzzy Matching with Python | IBKR Quant<\/title>\n<meta name=\"description\" content=\"This post is going to delve into the textdistance package in Python, which provides a large collection of algorithms to do fuzzy matching.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/185717\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Guide to Fuzzy Matching with Python | IBKR Campus US\" \/>\n<meta property=\"og:description\" content=\"This post is going to delve into the textdistance package in Python, which provides a large collection of algorithms to do fuzzy matching.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/\" \/>\n<meta property=\"og:site_name\" content=\"IBKR Campus US\" \/>\n<meta property=\"article:published_time\" content=\"2023-02-28T14:46:26+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-02-28T14:46:47+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-yellow-background.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"563\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Andrew Treadway\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrew Treadway\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\n\t    \"@context\": \"https:\\\/\\\/schema.org\",\n\t    \"@graph\": [\n\t        {\n\t            \"@type\": \"NewsArticle\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/guide-to-fuzzy-matching-with-python\\\/#article\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/guide-to-fuzzy-matching-with-python\\\/\"\n\t            },\n\t            \"author\": {\n\t                \"name\": \"Andrew Treadway\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/d4018570a16fb867f1c08412fc9c64bc\"\n\t            },\n\t            \"headline\": \"Guide to Fuzzy Matching with Python\",\n\t            \"datePublished\": \"2023-02-28T14:46:26+00:00\",\n\t            \"dateModified\": \"2023-02-28T14:46:47+00:00\",\n\t            \"mainEntityOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/guide-to-fuzzy-matching-with-python\\\/\"\n\t            },\n\t            \"wordCount\": 814,\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/guide-to-fuzzy-matching-with-python\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2023\\\/02\\\/python-yellow-background.jpg\",\n\t            \"keywords\": [\n\t                \"Cosine similarity\",\n\t                \"Fuzzy Matching\",\n\t                \"Jaccard Similarity\",\n\t                \"Jaro-Winkler\",\n\t                \"Levenshtein distance\",\n\t                \"MRA (Match Rating Approach)\",\n\t                \"Needleman-Wunsch\",\n\t                \"Python\",\n\t                \"textdistance package\"\n\t            ],\n\t            \"articleSection\": [\n\t                \"Data Science\",\n\t                \"Programming Languages\",\n\t                \"Python Development\",\n\t                \"Quant\",\n\t                \"Quant Development\",\n\t                \"Quant North America\",\n\t                \"Quant Regions\"\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"WebPage\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/guide-to-fuzzy-matching-with-python\\\/\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/guide-to-fuzzy-matching-with-python\\\/\",\n\t            \"name\": \"Guide to Fuzzy Matching with Python | IBKR Campus US\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\"\n\t            },\n\t            \"primaryImageOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/guide-to-fuzzy-matching-with-python\\\/#primaryimage\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/guide-to-fuzzy-matching-with-python\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2023\\\/02\\\/python-yellow-background.jpg\",\n\t            \"datePublished\": \"2023-02-28T14:46:26+00:00\",\n\t            \"dateModified\": \"2023-02-28T14:46:47+00:00\",\n\t            \"description\": \"This post is going to delve into the textdistance package in Python, which provides a large collection of algorithms to do fuzzy matching.\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"ReadAction\",\n\t                    \"target\": [\n\t                        \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/guide-to-fuzzy-matching-with-python\\\/\"\n\t                    ]\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"ImageObject\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/guide-to-fuzzy-matching-with-python\\\/#primaryimage\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2023\\\/02\\\/python-yellow-background.jpg\",\n\t            \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2023\\\/02\\\/python-yellow-background.jpg\",\n\t            \"width\": 1000,\n\t            \"height\": 563,\n\t            \"caption\": \"Python Quant\"\n\t        },\n\t        {\n\t            \"@type\": \"WebSite\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"name\": \"IBKR Campus US\",\n\t            \"description\": \"Financial Education from Interactive Brokers\",\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"SearchAction\",\n\t                    \"target\": {\n\t                        \"@type\": \"EntryPoint\",\n\t                        \"urlTemplate\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/?s={search_term_string}\"\n\t                    },\n\t                    \"query-input\": {\n\t                        \"@type\": \"PropertyValueSpecification\",\n\t                        \"valueRequired\": true,\n\t                        \"valueName\": \"search_term_string\"\n\t                    }\n\t                }\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"Organization\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\",\n\t            \"name\": \"Interactive Brokers\",\n\t            \"alternateName\": \"IBKR\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"logo\": {\n\t                \"@type\": \"ImageObject\",\n\t                \"inLanguage\": \"en-US\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\",\n\t                \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"width\": 669,\n\t                \"height\": 669,\n\t                \"caption\": \"Interactive Brokers\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\"\n\t            },\n\t            \"publishingPrinciples\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/about-ibkr-campus\\\/\",\n\t            \"ethicsPolicy\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/cyber-security-notice\\\/\"\n\t        },\n\t        {\n\t            \"@type\": \"Person\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/d4018570a16fb867f1c08412fc9c64bc\",\n\t            \"name\": \"Andrew Treadway\",\n\t            \"description\": \"Andrew Treadway currently works as a Senior Data Scientist, and has experience doing analytics, software automation, and ETL. He completed a master\u2019s degree in computer science \\\/ machine learning, and an undergraduate degree in pure mathematics. Connect with him on LinkedIn: https:\\\/\\\/www.linkedin.com\\\/in\\\/andrew-treadway-a3b19b103\\\/In addition to TheAutomatic.net blog, he also teaches in-person courses on Python and R through my NYC meetup: more details.\",\n\t            \"sameAs\": [\n\t                \"https:\\\/\\\/theautomatic.net\\\/about-me\\\/\"\n\t            ],\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/author\\\/andrewtreadway\\\/\"\n\t        }\n\t    ]\n\t}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Guide to Fuzzy Matching with Python | IBKR Quant","description":"This post is going to delve into the textdistance package in Python, which provides a large collection of algorithms to do fuzzy matching.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/185717\/","og_locale":"en_US","og_type":"article","og_title":"Guide to Fuzzy Matching with Python | IBKR Campus US","og_description":"This post is going to delve into the textdistance package in Python, which provides a large collection of algorithms to do fuzzy matching.","og_url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/","og_site_name":"IBKR Campus US","article_published_time":"2023-02-28T14:46:26+00:00","article_modified_time":"2023-02-28T14:46:47+00:00","og_image":[{"width":1000,"height":563,"url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-yellow-background.jpg","type":"image\/jpeg"}],"author":"Andrew Treadway","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Andrew Treadway","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/#article","isPartOf":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/"},"author":{"name":"Andrew Treadway","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/d4018570a16fb867f1c08412fc9c64bc"},"headline":"Guide to Fuzzy Matching with Python","datePublished":"2023-02-28T14:46:26+00:00","dateModified":"2023-02-28T14:46:47+00:00","mainEntityOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/"},"wordCount":814,"publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-yellow-background.jpg","keywords":["Cosine similarity","Fuzzy Matching","Jaccard Similarity","Jaro-Winkler","Levenshtein distance","MRA (Match Rating Approach)","Needleman-Wunsch","Python","textdistance package"],"articleSection":["Data Science","Programming Languages","Python Development","Quant","Quant Development","Quant North America","Quant Regions"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/","url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/","name":"Guide to Fuzzy Matching with Python | IBKR Campus US","isPartOf":{"@id":"https:\/\/ibkrcampus.com\/campus\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/#primaryimage"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-yellow-background.jpg","datePublished":"2023-02-28T14:46:26+00:00","dateModified":"2023-02-28T14:46:47+00:00","description":"This post is going to delve into the textdistance package in Python, which provides a large collection of algorithms to do fuzzy matching.","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/guide-to-fuzzy-matching-with-python\/#primaryimage","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-yellow-background.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-yellow-background.jpg","width":1000,"height":563,"caption":"Python Quant"},{"@type":"WebSite","@id":"https:\/\/ibkrcampus.com\/campus\/#website","url":"https:\/\/ibkrcampus.com\/campus\/","name":"IBKR Campus US","description":"Financial Education from Interactive Brokers","publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ibkrcampus.com\/campus\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ibkrcampus.com\/campus\/#organization","name":"Interactive Brokers","alternateName":"IBKR","url":"https:\/\/ibkrcampus.com\/campus\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","width":669,"height":669,"caption":"Interactive Brokers"},"image":{"@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/"},"publishingPrinciples":"https:\/\/www.interactivebrokers.com\/campus\/about-ibkr-campus\/","ethicsPolicy":"https:\/\/www.interactivebrokers.com\/campus\/cyber-security-notice\/"},{"@type":"Person","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/d4018570a16fb867f1c08412fc9c64bc","name":"Andrew Treadway","description":"Andrew Treadway currently works as a Senior Data Scientist, and has experience doing analytics, software automation, and ETL. He completed a master\u2019s degree in computer science \/ machine learning, and an undergraduate degree in pure mathematics. Connect with him on LinkedIn: https:\/\/www.linkedin.com\/in\/andrew-treadway-a3b19b103\/In addition to TheAutomatic.net blog, he also teaches in-person courses on Python and R through my NYC meetup: more details.","sameAs":["https:\/\/theautomatic.net\/about-me\/"],"url":"https:\/\/www.interactivebrokers.com\/campus\/author\/andrewtreadway\/"}]}},"jetpack_featured_media_url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2023\/02\/python-yellow-background.jpg","_links":{"self":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/185717","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/users\/388"}],"replies":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/comments?post=185717"}],"version-history":[{"count":0,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/185717\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media\/185725"}],"wp:attachment":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media?parent=185717"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/categories?post=185717"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/tags?post=185717"},{"taxonomy":"contributors-categories","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/contributors-categories?post=185717"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}