How is possible to remove all special characters except alphanumeric and accents?
I tried something like:
text = 'abcdeáéí.@# '
re.sub(r'[^a-zA-Z0-9áéíóúÁÉÍÓÚâêîôÂÊÎÔãõÃÕçÇ: ]', ' ', text)
But I hadn’t success. the following expression is valid to allow just alphanumeric but not to accents:
tmp = re.sub(r'[^a-zA-Z0-9: ]', '', x)
Could someone help me?
Make your text a unicode string
text = u'abcdeáéí.@# 'and make sure your pattern is able to accept unicode characters as wellre.sub(u'[^a-zA-Z0-9áéíóúÁÉÍÓÚâêîôÂÊÎÔãõÃÕçÇ: ]', ' ', text)With this combination, I get
u'abcde\xe1\xe9\xed 'as a result (where\xe1etc. are escape codes for the accent characters intextThere’s no need for
rin front of the pattern if you aren’t escaping any characters. It’s there so you can write things liker'\d\w'instead of'\\d\\w'