Categories
Python Answers

How to remove accents (normalize) in a Python unicode string?

Spread the love

Sometimes, we want to remove accents (normalize) in a Python unicode string.

In this article, we’ll look at how to remove accents (normalize) in a Python unicode string.

How to remove accents (normalize) in a Python unicode string?

To remove accents (normalize) in a Python unicode string, we can use the unicodedata.normalize method.

For instance, we write:

import unicodedata


def strip_accents(s):
    return ''.join(c for c in unicodedata.normalize('NFD', s)
                   if unicodedata.category(c) != 'Mn')
no_accent = strip_accents(u"A \u00c0 \u0394 \u038E")      
print(no_accent)             

We call unicodedata.normalize on the s string and then join all the returned letters in the list with join.

We filter out all the non-spacing characters in s with if unicodedata.category(c) != 'Mn'

Therefore, no_accent is 'A A Δ Υ'.

Conclusion

To remove accents (normalize) in a Python unicode string, we can use the unicodedata.normalize method.

By John Au-Yeung

Web developer specializing in React, Vue, and front end development.

Leave a Reply

Your email address will not be published. Required fields are marked *