I’m trying to do a comparison of some byte values – source A comes from a file that is being ‘read’:
f = open(fname, "rb")
f_data = f.read()
f.close()
These files can be anything from a few Kb to a few Mb large
Source B is a dictionary of known patterns:
eof_markers = {
'jpg':b'\xff\xd9',
'pdf':b'\x25\x25\x45\x4f\x46',
}
(This list will be extended once the basic process works)
Essentially I’m trying to ‘read’ the file (source A) and then incrementally inspect the last byte for matches to the pattern list testString = f_data[-counter:] If no match is found, it should increase counter by 1, and try to pattern match against the list again.
I’ve tried a number of different ways to get this working, I can get the testString to increment correctly, but I keep running into encode issue where various approaches are want to ASCIIify the byte to undertake the comparison.
I’m a bit lost, and not for the first time wandering around the code changing int to u to b and not getting past issues like d9 being a reserved value, and therefore not being able to use the ASCII type comparison tools e.g. if format_type in testString: (results in a UnicodeDecodeError: 'ascii' codec can't decode byte a9
I tried to convert everything to an integer, but that was throwing this error: ValueError: invalid literal for int() with base 2: '.' or ValueError: invalid literal for int() with base 10: '.' I tried to convert the testString to hex bytes, but kept getting TypeError: hex() argument can't be converted to hex (this is more my lack of understanding than anything else I’m sure!….)
There are a number of resources I’ve found that talk about encoding / hex comparisons e.g. stackoverflow.com/questions/10561923/unicodedecodeerror-ascii-codec-cant-decode-byte-0xef-in-position-1), I’ve just not found something that I can either fully understand, or that points me down the right path.
Its been a while I’ve been stuck on this, so any pointers are gratefully received.
I’m not sure exactly what you’re trying to do, but I ran this code in Python 3.2.3.
I’m using a hardcoded f_data, but you can undo that by just uncommenting lines 1-3 and comment line 4.
Here’s the output:
Is there something this isn’t doing that you need to do?