Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3422410
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T06:12:46+00:00 2026-05-18T06:12:46+00:00

I have an xml file, which contains a set of textual element tags (each

  • 0

I have an xml file, which contains a set of textual element tags (each contains the decimal offset value and data length of the corresponding binary element) and the whole binary data of all the elements at the end. An example is as follows.

<?xml version="1.0" encoding="UTF-8"?>
<Package>
  <element>
        <offset>0</offset>
        <length>2961181</length>
        <checksum>4238515972</checksum>
        <format>gzip</format>
  </element>
  <element>
        <offset>2961181</offset>
        <length>5442</length>
        <checksum>4238515972</checksum>
        <format>bin</format>
  </element>
</Package>
BINARY_DATA

please note, the offset is decimal and counts from the first byte after the headers.
How can I parse this file in python, grab the corresponding element based on the offset, uncompressed it (if its format is gzip) and store it as a file?

well, based on the replies from OmnipotentEntity and Jakob_B, I made the following short script, just to see if it works for the 1st element:

import zlib

f = open("file.xml", "r")
text = f.read()
position = text.find("</Package>\n")
headerSize=position+ len("</Package>\n") + 1 
offset=0
f.seek(headerSize + offset) 
length = 2961181
bin_data = f.read(length)
zipped=1
if (zipped):
  ungziped_str = zlib.decompressobj().decompress('x\x9c' + bin_data)
  print(ungziped_str)
f.close()

however, I got the following error:

Traceback (most recent call last):
File “file_parse.py”, line 11, in ?
ungziped_str = zlib.decompressobj().decompress(‘x\x9c’ + bin_data)
zlib.error: Error -3 while decompressing: invalid block type

what is the problem? the input file is incorrect, or the code is incorrect?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T06:12:46+00:00Added an answer on May 18, 2026 at 6:12 am

    The trick is going to be stopping XML parsers from puking on the binary data. lxml lets you feed a line at a time to a parser, so you can watch for the last XML tag and stop there:

    from lxml import etree
    
    def process(filename):
        f = file(filename,"r")
        parser = etree.XMLParser()
        for l in f:
            parser.feed(l)
            if l=="</Package>\n":
                break
        return parser.close()
    

    That returns an

    r=process("junk.xml")
    <Element Package at 9f14eb4>
    

    which is an lxml object you can get the data out of. The second object’s offset is here:

    >>> r[1][0].text
    '2961181'
    

    and so on. That should be enough for you to make a workable solution. Beware the line ending on the Package tag though, there might be a better way to do that, this might not work if the file has a different line ending.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have written xml file which contains html tags as element like <component> <input
I have an XML file of(30GB) which contains 2 classes of data, The data
I have a XML-file, which contains the data of a calendar. I want to
i have xml file which contains lots of data. now i want to pick
I have a xml file which contains arabic characters.When i try to parse a
I have a problem parsing a XML file which contains special characters like ,
I have a large XML file which in the middle contains the following: <ArticleName>Article
We have a PageRoles xml file which contains the page path and the user
I have a xml file which I need to open with Microsoft Word 2007.
I have this xml file which has text init. i.e Hi my name is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.