How to Scrape News Articles with Python

In this post we’re going to discuss how to scrape news articles with Python. This can be done using the handy newspaper package.

Introduction to Python’s newspaper package

The newspaper package can be installed using pip:

pip install newspaper

Once its installed, we can get started. newspaper can work by either scraping a single article from a given URL, or by finding the links on a webpage to other news articles. Let’s start with handling a single article. First, we need to import the Article class. Next, we use this class to download the content from the URL to our news article. Then, we use the parse method to parse the HTML. Lastly, we can print out the text of the article using .text.

Scraping a single article

from newspaper import Article

url = “https://www.bloomberg.com/news/articles/2020-08-01/apple-buys-startup-to-turn-iphones-into-payment-terminals?srnd=premium”

# download and parse article
article = Article(url)
article.download()
article.parse()

# print article text
print(article.text)

It’s also possible to get other information about the article, such as links to images or videos embedded in the post.

# get list of image links
article.images

# get list of videos – empty in this case
article.movies

Downloading all the articles linked on a webpage

Now, let’s look at how we can all the news articles linked on a webpage. We’ll do that using the newspaper.build method, like below. Then, we can extract the article URLs using the article_urls method.

import newspaper

site = newspaper.build(“https://news.ycombinator.com/”)

# get list of article URLs
site.article_urls()

Using our object above, we can also get the contents of each of those articles. Here, all of the article objects are stored in the list, site.articles. For example, let’s get the first article’s contents.

site_article = site.articles[0]

site_article.download()
site_article.parse() print(site_article.text)

Now, let’s modify our code to get the top ten articles:

top_articles = []
for index in range(10):
article = site.articles[index]
article.download()
article.parse()
top_articles.append(article)

Now, we can look at the text of any of these articles.

print(site[0].text)

print(site[3].text)

Warning!

One important note when using newspaper is that if you run newspaper.build multiple times with the same URL, the package will cache and then remove the articles already scraped. For example, in the below code, we run newspaper.build two consecutive times and get different results. The second time we run it, the code just returns the newly added links.

site = newspaper.build(“https://news.ycombinator.com/”)

print(len(site.articles))

site = newspaper.build(“https://news.ycombinator.com/”)

print(len(site.articles))

This can be adjusted by adding a extra parameter to our function call, like below:

site = newspaper.build(“https://news.ycombinator.com/”, memoize_articles=False)

How to get article summaries

The newspaper package also supports some NLP functionality. You can check this out by calling the nlp method.

article = top_articles[3]

article.nlp()

Now, let’s use the summary method. This will attempt to return a summary of the article.

article.summary()

You can also get a list of keywords from the article.

article.keywords

How to get top trending Google keywords

newspaper has a couple of other cool features. For example, we can use it to easily pull the top trending searches on Google using the hot method.

newspaper.hot()

The package can also return a list of popular URLs, like below.

newspaper.popular_urls()

Conclusion

That’s all for now. In this post, we learned how to scrape news articles with Python. If you want to learn more about web scraping, check out my extensive web scraping fundamentals course I co-created with 365 Data Science, now available on Udemy. Also, make sure to check out their full program of courses (which includes mine) available by clicking here.

Visit TheAutomatic.net Blog to download additional code: https://theautomatic.net/2020/08/05/how-to-scrape-news-articles-with-python/.

Join The Conversation

For specific platform feedback and suggestions, please submit it directly to our team using these instructions.

If you have an account-specific question or concern, please reach out to Client Services.

We encourage you to look through our FAQs before posting. Your question may already be covered!

Visit IBKR.com Open an IBKR Account

Disclosure: Interactive Brokers Third Party

Information posted on IBKR Campus that is provided by third-parties does NOT constitute a recommendation that you should contract for the services of that third party. Third-party participants who contribute to IBKR Campus are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.

This material is from TheAutomatic.net and is being posted with its permission. The views expressed in this material are solely those of the author and/or TheAutomatic.net and Interactive Brokers is not endorsing or recommending any investment or trading discussed in the material. This material is not and should not be construed as an offer to buy or sell any security. It should not be construed as research or investment advice or a recommendation to buy, sell or hold any security or commodity. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

How much could you save on your margin loan by switching to Interactive Brokers?

Fill out the information below to see your estimated savings.

Current Interest Rate

Balance

USD

Margin Amount Borrowed

USD

Time Margin is Borrowed

IBKR will assess a surcharge of 1% on large loan balances unless otherwise prearranged with IBKR. The 1% surcharge would apply to all balances in the highest tier.

The interest calculator is based on information that we believe to be accurate and correct, but neither Interactive Brokers LLC nor its affiliates warrant its accuracy or adequacy and it should not be relied upon as such. Neither IBKR nor its affiliates are responsible for any errors or omissions or for results obtained from the use of this calculator.

Restrictions apply. Annual Percentage Rate (APR) on USD margin loan balances for IBKR Pro as of October 3, 2024. Interactive Brokers calculates the interest charged on margin loans using the applicable rates for each interest rate tier listed on its website. Learn more about margin loan rates.

The projections or other information generated by the Interest Calculator tool are hypothetical in nature, do not reflect actual results and are not guarantees of future results. Please note that results may vary with use of the tool over time.

Trading on margin is only for experienced investors with high risk tolerance. You may lose more than your initial investment. For additional information about rates on margin loans, please see Margin Loan Rates.

Master options fundamentals with our new Interactive Learning course

How to Scrape News Articles with Python

Introduction to Python’s newspaper package

Scraping a single article

Downloading all the articles linked on a webpage

Warning!

How to get article summaries

How to get top trending Google keywords

Conclusion

Join The Conversation

Disclosure: Interactive Brokers Third Party

Information on Other Interactive Brokers Affiliates

Interactive Brokers Canada Inc.

Interactive Brokers Australia Pty. Ltd.

Interactive Brokers Hong Kong Limited

Interactive Brokers India Pvt. Ltd.

Interactive Brokers Securities Japan Inc.

Interactive Brokers Singapore Pte. Ltd.

IBKR Campus Log In

Master options fundamentals with our new Interactive Learning course

Introduction to Python’s newspaper package

Scraping a single article

Downloading all the articles linked on a webpage

Warning!

How to get article summaries

How to get top trending Google keywords

Conclusion

Join The Conversation

Disclosure: Interactive Brokers Third Party

Bi-Weekly Newsletter

Daily Newsletter

Weekly Newsletter

Weekly Newsletter

Monthly Newsletter