Sometimes, we want to parse HTML using Python.
In this article, we’ll look at how to parse HTML using Python.
How to parse HTML using Python?
To parse HTML using Python, we can use Beautiful Soup.
For instance, we write
try:
from BeautifulSoup import BeautifulSoup
except ImportError:
from bs4 import BeautifulSoup
# ...
parsed_html = BeautifulSoup(html)
print(parsed_html.body.find('div', attrs={'class':'container'}).text)
to create a BeautifulSoup
object with the html
HTML string to parse it into an object.
Then we call parsed_html.body.find
with 'div
‘ and the attr
dict to find the div with the container
class.
And we return its text content with text
.
Conclusion
To parse HTML using Python, we can use Beautiful Soup.