Sometimes, we want to retrieve links from web page using Python and BeautifulSoup.
In this article, we’ll look at how to retrieve links from web page using Python and BeautifulSoup.
How to retrieve links from web page using Python and BeautifulSoup?
To retrieve links from web page using Python and BeautifulSoup, we can use the SoupStrainer
class.
For instance, we write
import httplib2
from bs4 import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request('http://www.example.com')
for link in BeautifulSoup(response, parse_only=SoupStrainer('a')):
if link.has_attr('href'):
print(link['href'])
to make a GET request to example.com with
http = httplib2.Http()
status, response = http.request('http://www.example.com')
Then we parse the response
by passing it into BeautifulSoup
.
And we get the anchor elements by setting the parse_only
argument to SoupStrainer('a')
.
In the loop, we loop through all the link
s and get the href
attribute of each link with attr
.