Categories
Python Answers

How to retrieve links from web page using Python and BeautifulSoup?

Spread the love

Sometimes, we want to retrieve links from web page using Python and BeautifulSoup.

In this article, we’ll look at how to retrieve links from web page using Python and BeautifulSoup.

How to retrieve links from web page using Python and BeautifulSoup?

To retrieve links from web page using Python and BeautifulSoup, we can use the SoupStrainer class.

For instance, we write

import httplib2
from bs4 import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.example.com')

for link in BeautifulSoup(response, parse_only=SoupStrainer('a')):
    if link.has_attr('href'):
        print(link['href'])

to make a GET request to example.com with

http = httplib2.Http()
status, response = http.request('http://www.example.com')

Then we parse the response by passing it into BeautifulSoup.

And we get the anchor elements by setting the parse_only argument to SoupStrainer('a').

In the loop, we loop through all the links and get the href attribute of each link with attr.

By John Au-Yeung

Web developer specializing in React, Vue, and front end development.

Leave a Reply

Your email address will not be published. Required fields are marked *