Sometimes, we want to remove accents (normalize) in a Python unicode string.
In this article, we’ll look at how to remove accents (normalize) in a Python unicode string.
How to remove accents (normalize) in a Python unicode string?
To remove accents (normalize) in a Python unicode string, we can use the unicodedata.normalize
method.
For instance, we write:
import unicodedata
def strip_accents(s):
return ''.join(c for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn')
no_accent = strip_accents(u"A \u00c0 \u0394 \u038E")
print(no_accent)
We call unicodedata.normalize
on the s
string and then join all the returned letters in the list with join
.
We filter out all the non-spacing characters in s
with if unicodedata.category(c) != 'Mn'
Therefore, no_accent
is 'A A Δ Υ'
.
Conclusion
To remove accents (normalize) in a Python unicode string, we can use the unicodedata.normalize
method.