Got a lovely script that is printing out a bunch of text in raw unicode to handle all the different language.
the script works fine in ascii carater and non latin based languages (Hindi, Chinese etc.)
However it failes to print out the raw values for characters such as “é” “è”…
instead of printing the raw unicode value \u00E9 in print “é” in the file which in turn displays a diamond interrogation mark on the webpage.
f = codecs.open(newFilePathAndName(path,filename,language),encoding='raw_unicode_escape', mode='w')
...
f.write(outputString)
when I do a “print” in my script it displays the caracters é as \xe9
any ideas ?
the only that pops to mind is to put a regex that replace \xe by \u00
The
raw_unicode_escapeencoding indeed does not provide escapes for values below 0xFF; these values are not normally escaped in a raw python unicode literal.Use the
unicode_escapeencoding instead: