I often work with utf-8 text containing characters like: \xc2\x99 \xc2\x95 \xc2\x85 etc These

Question

0

Editorial Team

Asked: May 23, 20262026-05-23T15:41:23+00:00 2026-05-23T15:41:23+00:00

I often work with utf-8 text containing characters like: \xc2\x99 \xc2\x95 \xc2\x85 etc These

0

I often work with utf-8 text containing characters like:

\xc2\x99

\xc2\x95

\xc2\x85

etc

These characters confuse other libraries I work with so need to be replaced.

What is an efficient way to do this, rather than:

text.replace('\xc2\x99', ' ').replace('\xc2\x85, '...')

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T15:41:24+00:00

There is always regular expressions; just list all of the offending characters inside square brackets like so:

import re
print re.sub(r'[\xc2\x99]'," ","Hello\xc2There\x99")

This prints: ‘Hello There ‘, with the unwanted characters replaced by spaces.

Alternately, if you have a different replacement character for each:

# remove annoying characters
chars = {
    '\xc2\x82' : ',',        # High code comma
    '\xc2\x84' : ',,',       # High code double comma
    '\xc2\x85' : '...',      # Tripple dot
    '\xc2\x88' : '^',        # High carat
    '\xc2\x91' : '\x27',     # Forward single quote
    '\xc2\x92' : '\x27',     # Reverse single quote
    '\xc2\x93' : '\x22',     # Forward double quote
    '\xc2\x94' : '\x22',     # Reverse double quote
    '\xc2\x95' : ' ',
    '\xc2\x96' : '-',        # High hyphen
    '\xc2\x97' : '--',       # Double hyphen
    '\xc2\x99' : ' ',
    '\xc2\xa0' : ' ',
    '\xc2\xa6' : '|',        # Split vertical bar
    '\xc2\xab' : '<<',       # Double less than
    '\xc2\xbb' : '>>',       # Double greater than
    '\xc2\xbc' : '1/4',      # one quarter
    '\xc2\xbd' : '1/2',      # one half
    '\xc2\xbe' : '3/4',      # three quarters
    '\xca\xbf' : '\x27',     # c-single quote
    '\xcc\xa8' : '',         # modifier - under curve
    '\xcc\xb1' : ''          # modifier - under line
}
def replace_chars(match):
    char = match.group(0)
    return chars[char]
return re.sub('(' + '|'.join(chars.keys()) + ')', replace_chars, text)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I often work with utf-8 text containing characters like: \xc2\x99 \xc2\x95 \xc2\x85 etc These

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply