How to replace unicode values using re in Python ? I’m looking for something

Question

0

Asked: May 23, 20262026-05-23T13:59:26+00:00 2026-05-23T13:59:26+00:00

How to replace unicode values using re in Python ? I’m looking for something

0

How to replace unicode values using re in Python ?
I’m looking for something like this:

line.replace('Ã','')
line.replace('¢','')
line.replace('Ã¢','')

Or is there any way which will replace all the non-ASCII characters from a file. Actually I converted PDF file to ASCII, where I’m getting some non-ASCII characters [e.g. bullets in PDF]

Please help me.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T13:59:27+00:00

Edit after feedback in comments.

Another solution would be to check the numeric value of each character and see if they are under 128, since ascii goes from 0 – 127. Like so:

# coding=utf-8

def removeUnicode():
    text = "hejsanäöåbadasd wodqpwdk"
    asciiText = ""
    for char in text:
        if(ord(char) < 128):
            asciiText = asciiText + char

    return asciiText

import timeit
start = timeit.Timer("removeUnicode()", "from __main__ import removeUnicode")
print "Time taken: " + str(start.timeit())

Here’s an altered version of jd‘s answer with benchmarks:

# coding=utf-8

def removeUnicode():
    text = u"hejsanäöåbadasd wodqpwdk"
    if(isinstance(text, str)):
        return text.decode('utf-8').encode("ascii", "ignore")
    else:
        return text.encode("ascii", "ignore")        

import timeit
start = timeit.Timer("removeUnicode()", "from __main__ import removeUnicode")
print "Time taken: " + str(start.timeit())

Output first solution using a str string as input:

computer:~ Ancide$ python test1.py
Time taken: 5.88719677925

Output first solution using a unicode string as input:

computer:~ Ancide$ python test1.py
Time taken: 7.21077990532

Output second solution using a str string as input:

computer:~ Ancide$ python test1.py
Time taken: 2.67580914497

Output second solution using a unicode string as input:

computer:~ Ancide$ python test1.py
Time taken: 1.740680933

Conclusion

Encoding is the faster solution and encoding the string is less code; Thus the better solution.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

How to replace unicode values using re in Python ? I’m looking for something

Leave an answerCancel reply

1 Answer

Conclusion

Leave an answer
Cancel reply