I’m using the following regular expression basically to search for and delete these characters.
invalid_unicode = re.compile(ur'(Û|²|°|±|É|¹|Í)')
My source code in ASCII encoded, and whenever I try to run the script it spits out:
SyntaxError: Non-ASCII character ‘\xdb’ in file ./release.py on line 273, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
If I follow the instructions at the given website, and place utf-8 on the second line encoding, my script doesn’t run. Instead it gives me this error:
SyntaxError: (unicode error) ‘utf8’ codec can’t decode byte 0xdb in position 0: unexpected end of data
How do I get this one regular expression running in an ASCII written script that’d be great.
You need to find out what encoding your editor is using, and set that per PEP263; or, make things more stable and portable (though alas perhaps a bit less readable) and use escape sequences in your string literal, i.e., use
u'(\xdb|\xb2|\xb0|\xb1|\xc9|\xb9|\xcd)'as the parameter to there.compilecall.