{"id":57223,"date":"2020-08-24T11:53:08","date_gmt":"2020-08-24T15:53:08","guid":{"rendered":"https:\/\/ibkrcampus.com\/?p=57223"},"modified":"2022-11-21T09:46:09","modified_gmt":"2022-11-21T14:46:09","slug":"how-to-scrape-news-articles-with-python","status":"publish","type":"post","link":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/","title":{"rendered":"How to Scrape News Articles with Python"},"content":{"rendered":"\n<p>In this post we\u2019re going to discuss how to scrape news articles with Python. This can be done using the handy&nbsp;<strong>newspaper<\/strong>&nbsp;package.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Introduction to Python\u2019s newspaper package<\/strong><\/h2>\n\n\n\n<p>The newspaper package can be installed using pip:<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\npip install newspaper\n<\/p>\n\n\n\n<p>Once its installed, we can get started.&nbsp;<strong>newspaper<\/strong>&nbsp;can work by either scraping a single article from a given URL, or by finding the links on a webpage to other news articles. Let\u2019s start with handling a single article. First, we need to import the&nbsp;<em>Article<\/em>&nbsp;class. Next, we use this class to download the content from the URL to our news article. Then, we use the&nbsp;<em>parse<\/em>&nbsp;method to parse the HTML. Lastly, we can print out the text of the article using&nbsp;<em>.text<\/em>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Scraping a single article<\/strong><\/h3>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\nfrom newspaper import Article<br><br>\n \nurl = &#8220;https:\/\/www.bloomberg.com\/news\/articles\/2020-08-01\/apple-buys-startup-to-turn-iphones-into-payment-terminals?srnd=premium&#8221;<br><br>\n \n# download and parse article<br>\narticle = Article(url)<br>\narticle.download()<br>\narticle.parse()<br><br>\n \n# print article text<br>\nprint(article.text)\n<\/p>\n\n\n\n<p>It\u2019s also possible to get other information about the article, such as links to images or videos embedded in the post.<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\n# get list of image links<br>\narticle.images<br><br>\n \n# get list of videos &#8211; empty in this case<br>\narticle.movies\n<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Downloading all the articles linked on a webpage<\/strong><\/h3>\n\n\n\n<p>Now, let\u2019s look at how we can all the news articles linked on a webpage. We\u2019ll do that using the&nbsp;<em>newspaper.build<\/em>&nbsp;method, like below. Then, we can extract the article URLs using the&nbsp;<em>article_urls<\/em>&nbsp;method.<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\nimport newspaper<br><br>\n \nsite = newspaper.build(&#8220;https:\/\/news.ycombinator.com\/&#8221;)  <br><br>\n \n# get list of article URLs<br>\nsite.article_urls()\n<\/p>\n\n\n\n<p>Using our object above, we can also get the contents of each of those articles. Here, all of the article objects are stored in the list,&nbsp;<em>site.articles<\/em>. For example, let\u2019s get the first article\u2019s contents.<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\nsite_article = site.articles[0]<br><br>\n \nsite_article.download()<br>\nsite_article.parse()\n \nprint(site_article.text)\n<\/p>\n\n\n\n<p>Now, let\u2019s modify our code to get the top ten articles:<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\ntop_articles = []<br>\nfor index in range(10):<br>\n    article = site.articles[index]<br>\n    article.download()<br>\n    article.parse()<br>\n    top_articles.append(article)\n<\/p>\n\n\n\n<p>Now, we can look at the text of any of these articles.<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\nprint(site[0].text)<br><br>\n \nprint(site[3].text)\n<\/p>\n\n\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Warning!<\/strong><\/h3>\n\n\n\n<p>One important note when using&nbsp;<strong>newspaper<\/strong>&nbsp;is that if you run&nbsp;<em>newspaper.build<\/em>&nbsp;multiple times with the same URL, the package will cache and then remove the articles already scraped. For example, in the below code, we run&nbsp;<em>newspaper.build<\/em>&nbsp;two consecutive times and get different results. The second time we run it, the code just returns the newly added links.<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\nsite = newspaper.build(&#8220;https:\/\/news.ycombinator.com\/&#8221;)    <br><br>\n \nprint(len(site.articles))<br><br>\n \nsite = newspaper.build(&#8220;https:\/\/news.ycombinator.com\/&#8221;)    <br><br>\n \nprint(len(site.articles))\n<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" data-src=\"https:\/\/i0.wp.com\/theautomatic.net\/wp-content\/uploads\/sites\/2\/2020\/08\/python-newspaper-build-function.png?w=640\" alt=\"scrape news articles with python\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" class=\"lazyload\" \/><\/figure>\n\n\n\n<p>This can be adjusted by adding a extra parameter to our function call, like below:<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\nsite = newspaper.build(&#8220;https:\/\/news.ycombinator.com\/&#8221;, memoize_articles=False)\n<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>How to get article summaries<\/strong><\/h3>\n\n\n\n<p>The&nbsp;<strong>newspaper<\/strong>&nbsp;package also supports some NLP functionality. You can check this out by calling the&nbsp;<em>nlp<\/em>&nbsp;method.<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\narticle = top_articles[3]<br><br>\n \narticle.nlp()\n<\/p>\n\n\n\n<p>Now, let\u2019s use the&nbsp;<em>summary<\/em>&nbsp;method. This will attempt to return a summary of the article.<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\narticle.summary()\n<\/p>\n\n\n\n<p>You can also get a list of keywords from the article.<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\narticle.keywords\n<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>How to get top trending Google keywords<\/strong><\/h3>\n\n\n\n<p><strong>newspaper<\/strong>&nbsp;has a couple of other cool features. For example, we can use it to easily pull the top trending searches on Google using the&nbsp;<em>hot<\/em>&nbsp;method.<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\nnewspaper.hot()\n<\/p>\n\n\n\n<p>The package can also return a list of popular URLs, like below.<\/p>\n\n\n\n<p style=\"background-color:#fcfcdb;font-size:11px\" class=\"has-background\">\nnewspaper.popular_urls()\n<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>That\u2019s all for now. In this post, we learned how to scrape news articles with Python. If you want to learn more about web scraping, check out my extensive&nbsp;<a href=\"https:\/\/www.udemy.com\/course\/web-scraping-and-api-fundamentals-in-python\/?referralCode=5F1A89DC81D5A8B7D589\">web scraping fundamentals course<\/a>&nbsp;I co-created with 365 Data Science, now available on Udemy. Also, make sure to check out their full&nbsp;<a href=\"https:\/\/365datascience.teachable.com\/courses\/data-scientist-career-track?affcode=130400_ubcdm-g4\">program of courses<\/a>&nbsp;(which includes mine) available by&nbsp;<a href=\"https:\/\/365datascience.teachable.com\/courses\/data-scientist-career-track?affcode=130400_ubcdm-g4\">clicking here<\/a>.<\/p>\n\n\n\n<p>Visit TheAutomatic.net Blog to download additional code: <a href=\"https:\/\/theautomatic.net\/2020\/08\/05\/how-to-scrape-news-articles-with-python\/\">https:\/\/theautomatic.net\/2020\/08\/05\/how-to-scrape-news-articles-with-python\/<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn how to scrape news articles with Python. This can be done using the handy &#8220;newspaper&#8221; package.<\/p>\n","protected":false},"author":388,"featured_media":57299,"comment_status":"closed","ping_status":"open","sticky":true,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[339,343,349,338,341,351,344],"tags":[8281,595,494,1038,7649,8282],"contributors-categories":[13695],"class_list":{"0":"post-57223","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science","8":"category-programing-languages","9":"category-python-development","10":"category-ibkr-quant-news","11":"category-quant-development","12":"category-quant-europe","13":"category-quant-regions","14":"tag-newspaper-python-package","15":"tag-python","16":"tag-quant","17":"tag-sentiment-analysis","18":"tag-sentiment-trading","19":"tag-ycombinator","20":"contributors-categories-theautomatic-net"},"pp_statuses_selecting_workflow":false,"pp_workflow_action":"current","pp_status_selection":"publish","acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>How to Scrape News Articles with Python | IBKR Quant<\/title>\n<meta name=\"description\" content=\"Learn how to scrape news articles with Python. This can be done using the handy &quot;newspaper&quot; package.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/57223\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Scrape News Articles with Python | IBKR Quant Blog\" \/>\n<meta property=\"og:description\" content=\"Learn how to scrape news articles with Python. This can be done using the handy &quot;newspaper&quot; package.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/\" \/>\n<meta property=\"og:site_name\" content=\"IBKR Campus US\" \/>\n<meta property=\"article:published_time\" content=\"2020-08-24T15:53:08+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-21T14:46:09+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/mobile-app-news.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"900\" \/>\n\t<meta property=\"og:image:height\" content=\"550\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Andrew Treadway\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrew Treadway\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\n\t    \"@context\": \"https:\\\/\\\/schema.org\",\n\t    \"@graph\": [\n\t        {\n\t            \"@type\": \"NewsArticle\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/how-to-scrape-news-articles-with-python\\\/#article\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/how-to-scrape-news-articles-with-python\\\/\"\n\t            },\n\t            \"author\": {\n\t                \"name\": \"Andrew Treadway\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/d4018570a16fb867f1c08412fc9c64bc\"\n\t            },\n\t            \"headline\": \"How to Scrape News Articles with Python\",\n\t            \"datePublished\": \"2020-08-24T15:53:08+00:00\",\n\t            \"dateModified\": \"2022-11-21T14:46:09+00:00\",\n\t            \"mainEntityOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/how-to-scrape-news-articles-with-python\\\/\"\n\t            },\n\t            \"wordCount\": 703,\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/how-to-scrape-news-articles-with-python\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/08\\\/mobile-app-news.jpg\",\n\t            \"keywords\": [\n\t                \"Newspaper Python package\",\n\t                \"Python\",\n\t                \"Quant\",\n\t                \"Sentiment Analysis\",\n\t                \"Sentiment Trading\",\n\t                \"ycombinator\"\n\t            ],\n\t            \"articleSection\": [\n\t                \"Data Science\",\n\t                \"Programming Languages\",\n\t                \"Python Development\",\n\t                \"Quant\",\n\t                \"Quant Development\",\n\t                \"Quant Europe\",\n\t                \"Quant Regions\"\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"WebPage\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/how-to-scrape-news-articles-with-python\\\/\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/how-to-scrape-news-articles-with-python\\\/\",\n\t            \"name\": \"How to Scrape News Articles with Python | IBKR Quant Blog\",\n\t            \"isPartOf\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\"\n\t            },\n\t            \"primaryImageOfPage\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/how-to-scrape-news-articles-with-python\\\/#primaryimage\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/how-to-scrape-news-articles-with-python\\\/#primaryimage\"\n\t            },\n\t            \"thumbnailUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/08\\\/mobile-app-news.jpg\",\n\t            \"datePublished\": \"2020-08-24T15:53:08+00:00\",\n\t            \"dateModified\": \"2022-11-21T14:46:09+00:00\",\n\t            \"description\": \"Learn how to scrape news articles with Python. This can be done using the handy \\\"newspaper\\\" package.\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"ReadAction\",\n\t                    \"target\": [\n\t                        \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/how-to-scrape-news-articles-with-python\\\/\"\n\t                    ]\n\t                }\n\t            ]\n\t        },\n\t        {\n\t            \"@type\": \"ImageObject\",\n\t            \"inLanguage\": \"en-US\",\n\t            \"@id\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/ibkr-quant-news\\\/how-to-scrape-news-articles-with-python\\\/#primaryimage\",\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/08\\\/mobile-app-news.jpg\",\n\t            \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2020\\\/08\\\/mobile-app-news.jpg\",\n\t            \"width\": 900,\n\t            \"height\": 550,\n\t            \"caption\": \"Quant News\"\n\t        },\n\t        {\n\t            \"@type\": \"WebSite\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#website\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"name\": \"IBKR Campus US\",\n\t            \"description\": \"Financial Education from Interactive Brokers\",\n\t            \"publisher\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\"\n\t            },\n\t            \"potentialAction\": [\n\t                {\n\t                    \"@type\": \"SearchAction\",\n\t                    \"target\": {\n\t                        \"@type\": \"EntryPoint\",\n\t                        \"urlTemplate\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/?s={search_term_string}\"\n\t                    },\n\t                    \"query-input\": {\n\t                        \"@type\": \"PropertyValueSpecification\",\n\t                        \"valueRequired\": true,\n\t                        \"valueName\": \"search_term_string\"\n\t                    }\n\t                }\n\t            ],\n\t            \"inLanguage\": \"en-US\"\n\t        },\n\t        {\n\t            \"@type\": \"Organization\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#organization\",\n\t            \"name\": \"Interactive Brokers\",\n\t            \"alternateName\": \"IBKR\",\n\t            \"url\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/\",\n\t            \"logo\": {\n\t                \"@type\": \"ImageObject\",\n\t                \"inLanguage\": \"en-US\",\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\",\n\t                \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"contentUrl\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/wp-content\\\/uploads\\\/sites\\\/2\\\/2024\\\/05\\\/ibkr-campus-logo.jpg\",\n\t                \"width\": 669,\n\t                \"height\": 669,\n\t                \"caption\": \"Interactive Brokers\"\n\t            },\n\t            \"image\": {\n\t                \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/logo\\\/image\\\/\"\n\t            },\n\t            \"publishingPrinciples\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/about-ibkr-campus\\\/\",\n\t            \"ethicsPolicy\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/cyber-security-notice\\\/\"\n\t        },\n\t        {\n\t            \"@type\": \"Person\",\n\t            \"@id\": \"https:\\\/\\\/ibkrcampus.com\\\/campus\\\/#\\\/schema\\\/person\\\/d4018570a16fb867f1c08412fc9c64bc\",\n\t            \"name\": \"Andrew Treadway\",\n\t            \"description\": \"Andrew Treadway currently works as a Senior Data Scientist, and has experience doing analytics, software automation, and ETL. He completed a master\u2019s degree in computer science \\\/ machine learning, and an undergraduate degree in pure mathematics. Connect with him on LinkedIn: https:\\\/\\\/www.linkedin.com\\\/in\\\/andrew-treadway-a3b19b103\\\/In addition to TheAutomatic.net blog, he also teaches in-person courses on Python and R through my NYC meetup: more details.\",\n\t            \"sameAs\": [\n\t                \"https:\\\/\\\/theautomatic.net\\\/about-me\\\/\"\n\t            ],\n\t            \"url\": \"https:\\\/\\\/www.interactivebrokers.com\\\/campus\\\/author\\\/andrewtreadway\\\/\"\n\t        }\n\t    ]\n\t}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"How to Scrape News Articles with Python | IBKR Quant","description":"Learn how to scrape news articles with Python. This can be done using the handy \"newspaper\" package.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.interactivebrokers.com\/campus\/wp-json\/wp\/v2\/posts\/57223\/","og_locale":"en_US","og_type":"article","og_title":"How to Scrape News Articles with Python | IBKR Quant Blog","og_description":"Learn how to scrape news articles with Python. This can be done using the handy \"newspaper\" package.","og_url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/","og_site_name":"IBKR Campus US","article_published_time":"2020-08-24T15:53:08+00:00","article_modified_time":"2022-11-21T14:46:09+00:00","og_image":[{"width":900,"height":550,"url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/mobile-app-news.jpg","type":"image\/jpeg"}],"author":"Andrew Treadway","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Andrew Treadway","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"NewsArticle","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/#article","isPartOf":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/"},"author":{"name":"Andrew Treadway","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/d4018570a16fb867f1c08412fc9c64bc"},"headline":"How to Scrape News Articles with Python","datePublished":"2020-08-24T15:53:08+00:00","dateModified":"2022-11-21T14:46:09+00:00","mainEntityOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/"},"wordCount":703,"publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/mobile-app-news.jpg","keywords":["Newspaper Python package","Python","Quant","Sentiment Analysis","Sentiment Trading","ycombinator"],"articleSection":["Data Science","Programming Languages","Python Development","Quant","Quant Development","Quant Europe","Quant Regions"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/","url":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/","name":"How to Scrape News Articles with Python | IBKR Quant Blog","isPartOf":{"@id":"https:\/\/ibkrcampus.com\/campus\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/#primaryimage"},"image":{"@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/#primaryimage"},"thumbnailUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/mobile-app-news.jpg","datePublished":"2020-08-24T15:53:08+00:00","dateModified":"2022-11-21T14:46:09+00:00","description":"Learn how to scrape news articles with Python. This can be done using the handy \"newspaper\" package.","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.interactivebrokers.com\/campus\/ibkr-quant-news\/how-to-scrape-news-articles-with-python\/#primaryimage","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/mobile-app-news.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/mobile-app-news.jpg","width":900,"height":550,"caption":"Quant News"},{"@type":"WebSite","@id":"https:\/\/ibkrcampus.com\/campus\/#website","url":"https:\/\/ibkrcampus.com\/campus\/","name":"IBKR Campus US","description":"Financial Education from Interactive Brokers","publisher":{"@id":"https:\/\/ibkrcampus.com\/campus\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ibkrcampus.com\/campus\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ibkrcampus.com\/campus\/#organization","name":"Interactive Brokers","alternateName":"IBKR","url":"https:\/\/ibkrcampus.com\/campus\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/","url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","contentUrl":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2024\/05\/ibkr-campus-logo.jpg","width":669,"height":669,"caption":"Interactive Brokers"},"image":{"@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/logo\/image\/"},"publishingPrinciples":"https:\/\/www.interactivebrokers.com\/campus\/about-ibkr-campus\/","ethicsPolicy":"https:\/\/www.interactivebrokers.com\/campus\/cyber-security-notice\/"},{"@type":"Person","@id":"https:\/\/ibkrcampus.com\/campus\/#\/schema\/person\/d4018570a16fb867f1c08412fc9c64bc","name":"Andrew Treadway","description":"Andrew Treadway currently works as a Senior Data Scientist, and has experience doing analytics, software automation, and ETL. He completed a master\u2019s degree in computer science \/ machine learning, and an undergraduate degree in pure mathematics. Connect with him on LinkedIn: https:\/\/www.linkedin.com\/in\/andrew-treadway-a3b19b103\/In addition to TheAutomatic.net blog, he also teaches in-person courses on Python and R through my NYC meetup: more details.","sameAs":["https:\/\/theautomatic.net\/about-me\/"],"url":"https:\/\/www.interactivebrokers.com\/campus\/author\/andrewtreadway\/"}]}},"jetpack_featured_media_url":"https:\/\/www.interactivebrokers.com\/campus\/wp-content\/uploads\/sites\/2\/2020\/08\/mobile-app-news.jpg","_links":{"self":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/57223","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/users\/388"}],"replies":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/comments?post=57223"}],"version-history":[{"count":0,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/posts\/57223\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media\/57299"}],"wp:attachment":[{"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/media?parent=57223"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/categories?post=57223"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/tags?post=57223"},{"taxonomy":"contributors-categories","embeddable":true,"href":"https:\/\/ibkrcampus.com\/campus\/wp-json\/wp\/v2\/contributors-categories?post=57223"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}