Categories
Python Answers

How to strip HTML from strings in Python?

Spread the love

Sometimes, we want to strip HTML from strings in Python.

In this article, we’ll look at how to strip HTML from strings in Python.

How to strip HTML from strings in Python?

To strip HTML from strings in Python, we can create a subclass of the HTMLParser class.

For instance, we write

from io import StringIO
from html.parser import HTMLParser

class MLStripper(HTMLParser):
    def __init__(self):
        super().__init__()
        self.reset()
        self.strict = False
        self.convert_charrefs= True
        self.text = StringIO()

    def handle_data(self, d):
        self.text.write(d)

    def get_data(self):
        return self.text.getvalue()

def strip_tags(html):
    s = MLStripper()
    s.feed(html)
    return s.get_data()

to create the MLStripper class.

In it, we call self.text.getvalue in the get_data method to return the HTML content without the tags.

And then in the strip_tags function, we create an MLStripper object.

We put the html in the object with feed to parse it.

And then we call s.get_data to return the parsed HTML without the tags.

Conclusion

To strip HTML from strings in Python, we can create a subclass of the HTMLParser class.

By John Au-Yeung

Web developer specializing in React, Vue, and front end development.

Leave a Reply

Your email address will not be published. Required fields are marked *