At some point our python script receives string like that:
In [1]: ab = 'asd\xeffe\ctive'
In [2]: print ab
asd�fe\ctve \ \\ \\\k\\\
Data is damaged we need escape \x to be properly interpreted as \x but \c has not special meaning in string thus must be intact.
So far the closest solution I found is do something like:
In [1]: ab = 'asd\xeffe\ctve \\ \\\\ \\\\\\k\\\\\\'
In [2]: print ab.encode('string-escape').replace('\\\\', '\\').replace("\\'", "'")
asd\xeffe\ctve \ \\ \\\k\\\
Output taken from IPython, I assumed that ab is a string not unicode string (in the later case we would have to do something like that:
def escape_string(s):
if isinstance(s, str):
s = s.encode('string-escape').replace('\\\\', '\\').replace("\\'", "'")
elif isinstance(s, unicode):
s = s.encode('unicode-escape').replace('\\\\', '\\').replace("\\'", "'")
return s
'\\'is the same as'\x5c'. It is just two different ways to write the backslash character as a Python string literal.These literal strings:
r'\c','\\c','\x5cc','\x5c\x63'are identicalstrobjects in memory.'\xef'is a single byte (239as an integer), butr'\xef'(same as'\\xef') is a 4-byte string:'\x5c\x78\x65\x66'.If
s[0]returns'\xef'then it is whatsobject actually contains. If it is wrong then fix the source of the data.Note:
string-escapealso escapes\nand the like:backslashreplaceis used only on characters that causeUnicodeEncodeError: