I’m aware that Python 3 fixes a lot of UTF issues, I am not however able to use Python 3, I am using 2.5.1
I’m trying to regex a document but the document has UTF hyphens in it – rather than -. Python can’t match these and if I put them in the regex it throws a wobbly.
How can I force Python to use a UTF string or in some way match a character such as that?
Thanks for your help
After a quick test and visit to PEP 0264: Defining Python Source Code Encodings, I see you may need to tell Python the whole file is UTF-8 encoded by adding adding a comment like this to the first line.
Here’s the test file I created and ran on Python 2.5.1 / OS X 10.5.6