I’m trying to use string.replace('’','') to replace the dreaded weird single-quote character: ’ (aka \xe2 aka #8217). But when I run that line of code, I get this error:
SyntaxError: Non-ASCII character '\xe2' in file
EDIT: I get this error when trying to replace characters in a CSV file obtained remotely.
# encoding: utf-8
import urllib2
# read raw CSV data from URL
url = urllib2.urlopen('http://www.aaphoenix.org/meetings/aa_meetings.csv')
raw = url.read()
# replace bad characters
raw = raw.replace('’', "")
print(raw)
Even after the above code is executed, the unwanted character still exists in the print result. I tried the suggestions in the below answers as well. Pretty sure it’s an encoding issue, but I just don’t know how to fix it, so of course any help is much appreciated.
The problem here is with the encoding of the file you downloaded (
aa_meetings.csv). The server doesn’t declare an encoding in its HTTP headers, but the only non-ASCII1 octet in the file has the value 0x92. You say that this is supposed to be “the dreaded weird single-quote character”, therefore the file’s encoding iswindows-1252. But you’re trying to search and replace for the UTF-8 encoding of U+2019, i.e.'\xe2\x80\x99', which is not what is in the file.Fixing this is as simple as adding appropriate calls to
encodeanddecode:1 by “ASCII” I mean “the character encoding which maps single octets with values 0x00 through 0x7F directly to U+0000 through U+007F, and does not define the meaning of octets with values 0x80 through 0xFF”.