Categories
Python Answers

How to extract text from HTML file using Python?

Spread the love

Sometimes, we want to extract text from HTML file using Python.

In this article, we’ll look at how to extract text from HTML file using Python.

How to extract text from HTML file using Python?

To extract text from HTML file using Python, we can use BeautifulSoup.

For instance, we write

from bs4 import BeautifulSoup

clean_text = ' '.join(BeautifulSoup(some_html_string, "html.parser").stripped_strings)

to create a BeautifulSoup object with some_html_string and 'html.parser'.

Then we get the stripped_strings property from the object to get a list of strings extracted from some_html_string.

Next, we call join to join the strings together with a space.

Conclusion

To extract text from HTML file using Python, we can use BeautifulSoup.

By John Au-Yeung

Web developer specializing in React, Vue, and front end development.

Leave a Reply

Your email address will not be published. Required fields are marked *