{"id":56110,"date":"2020-08-17T09:50:00","date_gmt":"2020-08-17T13:50:00","guid":{"rendered":"https:\/\/ibkrcampus.com\/?p=56110"},"modified":"2022-11-21T09:46:06","modified_gmt":"2022-11-21T14:46:06","slug":"bag-of-words-approach-python-code-limitations","status":"publish","type":"post","link":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/","title":{"rendered":"Bag of Words: Approach, Python Code, Limitations"},"content":{"rendered":"\n<p>In this blog, we will study the Bag of Words method for creating vectorized representations of text data. These representations can then be used to perform&nbsp;<a href=\"https:\/\/blog.quantinsti.com\/natural-language-processing-webinar-14-january-2020\/\">Natural Language Processing<\/a>&nbsp;tasks such as&nbsp;<a href=\"https:\/\/blog.quantinsti.com\/quantitative-trading-using-sentiment-analysis-webinar\/\">Sentiment Analysis<\/a>. We&#8217;ll understand the relevant terms, limitations, and further highlight the advantages of the method. The topics covered are:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Bag of Words Approach<\/li><li>Limitations of Bag of Words<\/li><li>Bag of Words vs Word2Vec<\/li><li>Advantages of Bag of Words<\/li><\/ul>\n\n\n\n<p>Bag of Words is a simplified feature extraction method for text data that is easy to implement. It involves maintaining a vocabulary and calculating the frequency of words, ignoring various abstractions of natural language such as grammar and word sequence.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a>Bag of Words Approach<\/a><\/h2>\n\n\n\n<p>The Bag of Words approach takes a document as input and breaks it into words. These words are also known as tokens and the process is termed as tokenization.<\/p>\n\n\n\n<p>Unique tokens collected from all processed documents then constitute to form an ordered vocabulary. Finally, a vector of length equivalent to the size of the vocabulary is created for each document with values representative of the frequency of the tokens appearing in the respective document.<\/p>\n\n\n\n<p>Note that, we ignore the order in which these words appear in our document. Hence the name \u2018Bag of Words\u2019 signifying the unordered collection of items in a bag. We can easily implement this approach in python. Below is an example demonstrating the same.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" data-src=\"https:\/\/d1rwhvwstyk9gu.cloudfront.net\/2020\/08\/Approach_bag_of_words-1.png\" alt=\"Approach_Bag_of_Words\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" class=\"lazyload\" \/><\/figure>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\n# corpus is a collection of documents, here sentences<br>\ncorpus = [&#8216;This is the first sentence in our corpus followed by one more sentence to demonstrate Bag of words&#8217;,<br>\n         &#8216;This is the second sentence in our corpus with a FEW UPPER CASE WORDS and Few Title Case Words&#8217;]<br><br>\n\nvocab = []         # empty list for vocabulary<br>\ntotal_words = 0    # to count total words in corpus<br><br>\n\nfor doc in corpus: # iterating through documents in corpus<br>\n    token_temp = doc.split() # create tokens<br>\n    total_words = total_words + len(token_temp)<br>\n    for i in range(len(token_temp)):<br>\n        if token_temp[i] not in vocab: # to check if word is already in vocab<br>\n            vocab.append(token_temp[i])<br><br>\n\nvocab.sort()<br><br>\n\nprint(vocab) # Print all the words in vocabulary<br>\nprint(&#8216;There are {} words in vocabulary.&#8217;.format(len(vocab))) <br>\nprint(&#8216;A total of {} words is used in documents.&#8217;.format(total_words))<br>\n<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1100\" height=\"87\" data-src=\"\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/Bag_of_words_approach_output-1-1100x87.png\" alt=\"Bag of Words\" class=\"wp-image-56215 lazyload\" data-srcset=\"https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/Bag_of_words_approach_output-1-1100x87.png 1100w, https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/Bag_of_words_approach_output-1-700x56.png 700w, https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/Bag_of_words_approach_output-1-300x24.png 300w, https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/Bag_of_words_approach_output-1-768x61.png 768w, https:\/\/ibkrcampus.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/Bag_of_words_approach_output-1.png 1486w\" data-sizes=\"(max-width: 1100px) 100vw, 1100px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1100px; aspect-ratio: 1100\/87;\" \/><\/figure>\n\n\n\n<p>Note the difference in the number of total words and length of vocabulary. We&#8217;ll now calculate the frequencies of words appearing in each document and store it in a dictionary.<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\nbow_vec = []      # list to store bag of words vectors<br><br>\n\nfor i in range(len(corpus)):<br>\n    doc_ = corpus[i].split()<br>\n    doc_vec = [] # empty array for each doc<br><br>\n    \n    for j in range(len(vocab)): # iterate over vocab <br>\n        if vocab[j] in doc_: <br>\n            doc_vec.append(l_[i][vocab[j]]) # append freq if present<br>\n        else:<br>\n            doc_vec.append(0) # else append zero<br>\n    bow_vec.append(doc_vec)<br><br>\n    \nimport pandas as pd<br>\npd.set_option(&#8220;display.max_columns&#8221;, None)<br>\ndf = pd.DataFrame(bow_vec, columns = vocab)<br>\ndf # bag of words vectorized representation\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" data-src=\"https:\/\/d1rwhvwstyk9gu.cloudfront.net\/2020\/08\/frequency_bag_of_words1-1.PNG\" alt=\"frequency_bag_of_words1\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" class=\"lazyload\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" data-src=\"https:\/\/d1rwhvwstyk9gu.cloudfront.net\/2020\/08\/frequency_bag_of_words2-1.PNG\" alt=\"frequency_bag_of_words2\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" class=\"lazyload\" \/><\/figure>\n\n\n\n<p><em>Stay tuned for the next installment in this series, in which the author will discuss Limitations of Bag of Words.<\/em><\/p>\n\n\n\n<p>To download the complete Python code, visit QuantInsti: <a href=\"https:\/\/blog.quantinsti.com\/bag-of-words\/\">https:\/\/blog.quantinsti.com\/bag-of-words\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Join QuantInsti for a tutorial on how to use Bag of Words with Python, and learn how this concept applies to sentiment trading. Download ready-to-use code for vectorized representations of text data.<\/p>\n","protected":false},"author":431,"featured_media":56253,"comment_status":"closed","ping_status":"open","sticky":true,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[339,343,349,338,350,341,344],"tags":[851,8224,8229,4582,852,2859,2860,1224,595,7649,8228,8226,8225,8227],"contributors-categories":[13654],"class_list":{"0":"post-56110","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science","8":"category-programing-languages","9":"category-python-development","10":"category-ibkr-quant-news","11":"category-quant-asia-pacific","12":"category-quant-development","13":"category-quant-regions","14":"tag-algo-trading","15":"tag-bag-of-words","16":"tag-corpus","17":"tag-dataframe","18":"tag-machine-learning","19":"tag-natural-language-processing","20":"tag-nlp","21":"tag-pandas","22":"tag-python","23":"tag-sentiment-trading","24":"tag-tokenization","25":"tag-vectorized-text-data","26":"tag-word-cloud","27":"tag-word2vec","28":"contributors-categories-quantinsti"},"pp_statuses_selecting_workflow":false,"pp_workflow_action":"current","pp_status_selection":"publish","acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Bag of Words: Approach, Python Code, Limitations<\/title>\n<meta name=\"description\" content=\"Join QuantInsti for a tutorial on how to use Bag of Words with Python, and learn how this concept applies to sentiment trading.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/56110\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Bag of Words: Approach, Python Code, Limitations | IBKR Quant Blog\" \/>\n<meta property=\"og:description\" content=\"Join QuantInsti for a tutorial on how to use Bag of Words with Python, and learn how this concept applies to sentiment trading.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/\" \/>\n<meta property=\"og:site_name\" content=\"IBKR Campus US\" \/>\n<meta property=\"article:published_time\" content=\"2020-08-17T13:50:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-21T14:46:06+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/sentiment-analysis.png\" \/>\n\t<meta property=\"og:image:width\" content=\"900\" \/>\n\t<meta property=\"og:image:height\" content=\"550\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Naman Swarnkar\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Naman Swarnkar\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\n\t    \"@context\": \"https:\\\/\\\/schema.org\",\n\t    \"@graph\": [\n\t        {\n\t            \"@type\": \"NewsArticle\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/bag-of-words-approach-python-code-limitations\\\/#article\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/bag-of-words-approach-python-code-limitations\\\/\"\n\t            },\n\t            \"author\": {\n\t                \"name\": \"Naman Swarnkar\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/0711c8311f398d8eb95dd6f0eef86b50\"\n\t            },\n\t            \"headline\": \"Bag of Words: Approach, Python Code, Limitations\",\n\t            \"datePublished\": \"2020-08-17T13:50:00+00:00\",\n\t            \"dateModified\": \"2022-11-21T14:46:06+00:00\",\n\t            \"mainEntityOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/bag-of-words-approach-python-code-limitations\\\/\"\n\t            },\n\t            \"wordCount\": 535,\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/bag-of-words-approach-python-code-limitations\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/08\\\/sentiment-analysis.png\",\n\t            \"keywords\": [\n\t                \"Algo Trading\",\n\t                \"Bag of Words\",\n\t                \"Corpus\",\n\t                \"Dataframe\",\n\t                \"Machine Learning\",\n\t                \"Natural Language Processing\",\n\t                \"NLP\",\n\t                \"Pandas\",\n\t                \"Python\",\n\t                \"Sentiment Trading\",\n\t                \"Tokenization\",\n\t                \"Vectorized Text Data\",\n\t                \"Word Cloud\",\n\t                \"Word2Vec\"\n\t            ],\n\t            \"articleSection\": [\n\t                \"Data Science\",\n\t                \"Programming Languages\",\n\t                \"Python Development\",\n\t                \"Quant\",\n\t                \"Quant Asia Pacific\",\n\t                \"Quant Development\",\n\t                \"Quant Regions\"\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"WebPage\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/bag-of-words-approach-python-code-limitations\\\/\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/bag-of-words-approach-python-code-limitations\\\/\",\n\t            \"name\": \"Bag of Words: Approach, Python Code, Limitations | IBKR Quant Blog\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\"\n\t            },\n\t            \"primaryImageOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/bag-of-words-approach-python-code-limitations\\\/#primaryimage\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/bag-of-words-approach-python-code-limitations\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/08\\\/sentiment-analysis.png\",\n\t            \"datePublished\": \"2020-08-17T13:50:00+00:00\",\n\t            \"dateModified\": \"2022-11-21T14:46:06+00:00\",\n\t            \"description\": \"Join QuantInsti for a tutorial on how to use Bag of Words with Python, and learn how this concept applies to sentiment trading.\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"ReadAction\",\n\t                    \"target\": [\n\t                        \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/bag-of-words-approach-python-code-limitations\\\/\"\n\t                    ]\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"ImageObject\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/bag-of-words-approach-python-code-limitations\\\/#primaryimage\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/08\\\/sentiment-analysis.png\",\n\t            \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/08\\\/sentiment-analysis.png\",\n\t            \"width\": 900,\n\t            \"height\": 550,\n\t            \"caption\": \"Sentiment Analysis\"\n\t        },\n\t        {\n\t            \"@type\": \"WebSite\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"name\": \"IBKR Campus US\",\n\t            \"description\": \"Financial Education from Interactive Brokers\",\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"SearchAction\",\n\t                    \"target\": {\n\t                        \"@type\": \"EntryPoint\",\n\t                        \"urlTemplate\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/?s={search_term_string}\"\n\t                    },\n\t                    \"query-input\": {\n\t                        \"@type\": \"PropertyValueSpecification\",\n\t                        \"valueRequired\": true,\n\t                        \"valueName\": \"search_term_string\"\n\t                    }\n\t                }\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"Organization\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\",\n\t            \"name\": \"Interactive Brokers\",\n\t            \"alternateName\": \"IBKR\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"logo\": {\n\t                \"@type\": \"ImageObject\",\n\t                \"inLanguage\": \"en-US\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\",\n\t                \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"width\": 669,\n\t                \"height\": 669,\n\t                \"caption\": \"Interactive Brokers\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\"\n\t            },\n\t            \"publishingPrinciples\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/about-ibkr-campus\\\/\",\n\t            \"ethicsPolicy\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/cyber-security-notice\\\/\"\n\t        },\n\t        {\n\t            \"@type\": \"Person\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/0711c8311f398d8eb95dd6f0eef86b50\",\n\t            \"name\": \"Naman Swarnkar\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/author\\\/namanswarnkar\\\/\"\n\t        }\n\t    ]\n\t}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Bag of Words: Approach, Python Code, Limitations","description":"Join QuantInsti for a tutorial on how to use Bag of Words with Python, and learn how this concept applies to sentiment trading.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/56110\/","og_locale":"en_US","og_type":"article","og_title":"Bag of Words: Approach, Python Code, Limitations | IBKR Quant Blog","og_description":"Join QuantInsti for a tutorial on how to use Bag of Words with Python, and learn how this concept applies to sentiment trading.","og_url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/","og_site_name":"IBKR Campus US","article_published_time":"2020-08-17T13:50:00+00:00","article_modified_time":"2022-11-21T14:46:06+00:00","og_image":[{"width":900,"height":550,"url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/sentiment-analysis.png","type":"image\/png"}],"author":"Naman Swarnkar","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Naman Swarnkar","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/#article","isPartOf":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/"},"author":{"name":"Naman Swarnkar","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/0711c8311f398d8eb95dd6f0eef86b50"},"headline":"Bag of Words: Approach, Python Code, Limitations","datePublished":"2020-08-17T13:50:00+00:00","dateModified":"2022-11-21T14:46:06+00:00","mainEntityOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/"},"wordCount":535,"publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/sentiment-analysis.png","keywords":["Algo Trading","Bag of Words","Corpus","Dataframe","Machine Learning","Natural Language Processing","NLP","Pandas","Python","Sentiment Trading","Tokenization","Vectorized Text Data","Word Cloud","Word2Vec"],"articleSection":["Data Science","Programming Languages","Python Development","Quant","Quant Asia Pacific","Quant Development","Quant Regions"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/","url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/","name":"Bag of Words: Approach, Python Code, Limitations | IBKR Quant Blog","isPartOf":{"@id":"https:\/\/ibkrcampus.com\/campus\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/#primaryimage"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/sentiment-analysis.png","datePublished":"2020-08-17T13:50:00+00:00","dateModified":"2022-11-21T14:46:06+00:00","description":"Join QuantInsti for a tutorial on how to use Bag of Words with Python, and learn how this concept applies to sentiment trading.","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/bag-of-words-approach-python-code-limitations\/#primaryimage","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/sentiment-analysis.png","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/sentiment-analysis.png","width":900,"height":550,"caption":"Sentiment Analysis"},{"@type":"WebSite","@id":"https:\/\/ibkrcampus.com\/campus\/#website","url":"https:\/\/ibkrcampus.com\/campus\/","name":"IBKR Campus US","description":"Financial Education from Interactive Brokers","publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ibkrcampus.com\/campus\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ibkrcampus.com\/campus\/#organization","name":"Interactive Brokers","alternateName":"IBKR","url":"https:\/\/ibkrcampus.com\/campus\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","width":669,"height":669,"caption":"Interactive Brokers"},"image":{"@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/"},"publishingPrinciples":"https:\/\/www.interactivebrokers.com\/campus\/about-ibkr-campus\/","ethicsPolicy":"https:\/\/www.interactivebrokers.com\/campus\/cyber-security-notice\/"},{"@type":"Person","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/0711c8311f398d8eb95dd6f0eef86b50","name":"Naman Swarnkar","url":"https:\/\/www.interactivebrokers.com\/campus\/author\/namanswarnkar\/"}]}},"jetpack_featured_media_url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/sentiment-analysis.png","_links":{"self":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/56110","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/users\/431"}],"replies":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/comments?post=56110"}],"version-history":[{"count":0,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/56110\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media\/56253"}],"wp:attachment":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media?parent=56110"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/categories?post=56110"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/tags?post=56110"},{"taxonomy":"contributors-categories","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/contributors-categories?post=56110"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}