I’m trying to use string.replace(‘’’,”) to replace the dreaded weird single-quote character: ’ (aka

Question

0

Asked: May 25, 20262026-05-25T12:22:58+00:00 2026-05-25T12:22:58+00:00

I’m trying to use string.replace(‘’’,”) to replace the dreaded weird single-quote character: ’ (aka

0

I’m trying to use string.replace('’','') to replace the dreaded weird single-quote character: ’ (aka \xe2 aka #8217). But when I run that line of code, I get this error:

SyntaxError: Non-ASCII character '\xe2' in file

EDIT: I get this error when trying to replace characters in a CSV file obtained remotely.

# encoding: utf-8

import urllib2

# read raw CSV data from URL
url = urllib2.urlopen('http://www.aaphoenix.org/meetings/aa_meetings.csv')
raw = url.read()

# replace bad characters
raw = raw.replace('’', "")

print(raw)

Even after the above code is executed, the unwanted character still exists in the print result. I tried the suggestions in the below answers as well. Pretty sure it’s an encoding issue, but I just don’t know how to fix it, so of course any help is much appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T12:22:58+00:00

The problem here is with the encoding of the file you downloaded (aa_meetings.csv). The server doesn’t declare an encoding in its HTTP headers, but the only non-ASCII¹ octet in the file has the value 0x92. You say that this is supposed to be “the dreaded weird single-quote character”, therefore the file’s encoding is windows-1252. But you’re trying to search and replace for the UTF-8 encoding of U+2019, i.e. '\xe2\x80\x99', which is not what is in the file.

Fixing this is as simple as adding appropriate calls to encode and decode:

# encoding: utf-8
import urllib2

# read raw CSV data from URL
url = urllib2.urlopen('http://www.aaphoenix.org/meetings/aa_meetings.csv')
raw = url.read().decode('windows-1252')

# replace bad characters
raw = raw.replace(u'’', u"'")

print(raw.encode("ascii"))

¹ by “ASCII” I mean “the character encoding which maps single octets with values 0x00 through 0x7F directly to U+0000 through U+007F, and does not define the meaning of octets with values 0x80 through 0xFF”.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to use string.replace(‘’’,”) to replace the dreaded weird single-quote character: ’ (aka

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply