Here is how I reproduce the problem:
Create a log file called ‘temp.log’ and paste this line into it
DEBUG: packetReceived '\x61\x62\x63'
I want to have a script which will read the line from the log file and decode the binary string part (‘\x61\x62\x63’). For the decoding, I am using struct, so:
struct.unpack('BBB', '\x61\x62\x63')
Should give me
(97, 98, 99)
Here is the script which I am using
import re
import struct
import sys
f = open(sys.argv[1], 'r')
for line in f:
print line
packet = re.compile(r"packetReceived \'(.*)\'").search(line).group(1)
# packet is the string r'\x61\x62\x63'
assert(len(packet), 12)
# this works ok (returns (97, 98, 99))
struct.unpack('BBB', '\x61\x62\x63')
# this fails because packet is interpreted as r'\\x61\\x62\x63'
struct.unpack('BBB', packet)
I run the script using temp.log as the argument to the script.
Hopefully the comments highlight my problem. How can I get the variable packet to be interpreted as ‘\x61\x62\x63’ ??
ASIDE: On the first edit of this question, I assumed that reading the line from the file was the same as this:
line = “DEBUG: packetReceived ‘\x61\x62\x63′”
which made packet == ‘abc’
however it is actually the same as this (using rawstring)
line = r”DEBUG: packetReceived ‘\x61\x62\x63′”
As described in your question, packet is equal to
'\x61\x62\x63'. Its len is 12 bytes, neither 15 nor 3 bytes.What confuses you, is that ipython (which I understand you are using) and the python interpreter display values using the
repr()call, which tries to format values as they would be in your code. Since backslashes are special in Python string constants,repr()displays them duplicated, as they would be in Python code.This might be of help:
Count your characters and see how they are printed. First column displays the ordinal value of the character, second column has the character itself, third column has the
reprof the character.EDIT
Change the last line:
to: