Further to this question: Handling and working with binary data HEX with python (and

Question

0

Asked: June 11, 20262026-06-11T23:52:38+00:00 2026-06-11T23:52:38+00:00

Further to this question: Handling and working with binary data HEX with python (and

0

Further to this question: Handling and working with binary data HEX with python (and thanks to awesome pointers I received) I’m stuck on one last aspect of tool.

I am basically writing a cleaner for files that I have with data past the EOF marker. This extra data means they fail some validation tools. I need to strip the extra data, so they be presented to the validator, however I don’t want to throw this data away (in fact I have to keep it…)

I’ve written an XML container to hold the data, and a few other provenance/audit type values, but I’m (still) stuck on elegantly moving between raw binary and something I can “bake” in to a file.

example:

A jpg file ends with (hex editor view)
96 1a 9c fd ab 4f 9e 69 27 ad fd da 0a db 76 bb ee d2 6a fd ff 00 ff d9 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

The EOF marker for jpg is ff d9, so the cleaner works backwards through the file until its a match against the EOF marker. In this case it would create a new jpg file stopping at the ff d9 and then attempt to write the stripped data to the XML (via the elementTree lib): changeString.text =str(excessData)

Of course this wont work as the XML writer is looking to write ASCII not binary dumps.

In the above case, the error is UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128) which I can see if because its not a valid ASCII character

My question therefore, is how do I elegantly deal with this raw data, in a way it can stored and used in the future? (I plan to write an ‘uncleaner’ next that can take the clean file and the XML and reconstruct the original file…)

______EDIT_______

Using the suggestions from below, this is the traceback:

Traceback (most recent call last):
  File "C:\...\EOF_cleaner\scripts\test6.py", line 87, in <module> main()
  File "C:\...\EOF_cleaner\scripts\test6.py", line 73, in main splitFile(f_data, offset)
  File "C:\...EOF_cleaner\scripts\test6.py", line 60, in splitFile makeXML(excessData)
  File "C:\...\EOF_cleaner\scripts\test6.py", line 53 in makeXML ET.ElementTree(root).write(noteFile)
  File "c:\python27\lib\xml\etree\ElementTree.py", line 815, in write serialize(write, self._root, encoding, qnames, namespaces)
  File "c:\python27\lib\xml\etree\ElementTree.py", line 934, in _serialize_xml_serialize_xml(write, e, encoding, qnames, None)
  File "c:\python27\lib\xml\etree\ElementTree.py", line 934, in _serialize_xml_serialize_xml(write, e, encoding, qnames, None)
  File "c:\python27\lib\xml\etree\ElementTree.py", line 934, in _serialize_xml_serialize_xml(write, e, encoding, qnames, None)
  File "c:\python27\lib\xml\etree\ElementTree.py", line 932, in _serialize_xml write(_escape_cdata(text, encoding))
  File "c:\python27\lib\xml\etree\ElementTree.py", line 1068, in _escape_cdata  return text.encode(encoding, "xmlcharrefreplace")
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

The line that throws things is changeString.text = excessData.encode('base64') (line 45) and ET.ElementTree(root).write(noteFile) (line 53)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T23:52:40+00:00

Editorial Team

2026-06-11T23:52:40+00:00Added an answer on June 11, 2026 at 11:52 pm

Use Base64:

excessData.encode('base64')

It’ll be easy to turn that back to binary data later on with a simple .decode('base64') call.

Base64 encodes to ASCII data safe for inclusion in XML, in a reasonably compact format; every 3 bytes of binary information become 4 Base64 characters.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Further to this question: Handling and working with binary data HEX with python (and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply