Sometimes, we want to extract text from HTML file using Python.
In this article, we’ll look at how to extract text from HTML file using Python.
How to extract text from HTML file using Python?
To extract text from HTML file using Python, we can use BeautifulSoup.
For instance, we write
from bs4 import BeautifulSoup
clean_text = ' '.join(BeautifulSoup(some_html_string, "html.parser").stripped_strings)
to create a BeautifulSoup
object with some_html_string
and 'html.parser'
.
Then we get the stripped_strings
property from the object to get a list of strings extracted from some_html_string
.
Next, we call join
to join the strings together with a space.
Conclusion
To extract text from HTML file using Python, we can use BeautifulSoup.