Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8183769
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T01:16:04+00:00 2026-06-07T01:16:04+00:00

I have data stored in either a collection of files or in a single

  • 0

I have data stored in either a collection of files or in a single compound file. The compound file is formed by concatenating all the separate files, and then preceding everything with a header that gives the offsets and sizes of the constituent parts. I’d like to have a file-like object that presents a view of the compound file, where the view represents just one of the member files. (That way, I can have functions for reading the data that accept either a real file object or a “view” object, and they needn’t worry about how any particular dataset is stored.) What library will do this for me?

The mmap class looked promising since it’s constructed from a file, a length, and an offset, which is exactly what I have, but the offset needs to be aligned with the underlying file system’s allocation granularity, and the files I’m reading don’t meet that requirement. The name of the MultiFile class fits the bill, but it’s tailored for attachments in e-mail messages, and my files don’t have that structure.

The file operations I’m most interested in are read, seek, and tell. The files I’m reading are binary, so the text-oriented functions like readline and next aren’t so crucial. I might eventually also need write, but I’m willing to forego that feature for now since I’m not sure how appending should behave.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T01:16:06+00:00Added an answer on June 7, 2026 at 1:16 am

    I know you were searching for a library, but as soon as I read this question I thought I’d write my own. So here it is:

    import os
    
    class View:
        def __init__(self, f, offset, length):
            self.f = f
            self.f_offset = offset
            self.offset = 0
            self.length = length
    
        def seek(self, offset, whence=0):
            if whence == os.SEEK_SET:
                self.offset = offset
            elif whence == os.SEEK_CUR:
                self.offset += offset
            elif whence == os.SEEK_END:
                self.offset = self.length+offset
            else:
                # Other values of whence should raise an IOError
                return self.f.seek(offset, whence)
            return self.f.seek(self.offset+self.f_offset, os.SEEK_SET)
    
        def tell(self):
            return self.offset
    
        def read(self, size=-1):
            self.seek(self.offset)
            if size<0:
                size = self.length-self.offset
            size = max(0, min(size, self.length-self.offset))
            self.offset += size
            return self.f.read(size)
    
    if __name__ == "__main__":
        f = open('test.txt', 'r')
    
        views = []
        offsets = [i*11 for i in range(10)]
    
        for o in offsets:
            f.seek(o+1)
            length = int(f.read(1))
            views.append(View(f, o+2, length))
    
        f.seek(0)
    
        completes = {}
        for v in views:
            completes[v.f_offset] = v.read()
            v.seek(0)
    
        import collections
        strs = collections.defaultdict(str)
        for i in range(3):
            for v in views:
                strs[v.f_offset] += v.read(3)
        strs = dict(strs) # We want it to raise KeyErrors after that.
    
        for offset, s in completes.iteritems():
            print offset, strs[offset], completes[offset]
            assert strs[offset] == completes[offset], "Something went wrong!"
    

    And I wrote another script to generate the “test.txt” file:

    import string, random
    
    f = open('test.txt', 'w')
    
    for i in range(10):
        rand_list = list(string.ascii_letters)
        random.shuffle(rand_list)
        rand_str = "".join(rand_list[:9])
        f.write(".%d%s" % (len(rand_str), rand_str))
    

    It worked for me. The files I tested on are not binary files like yours, and they’re not as big as yours, but this might be useful, I hope. If not, then thank you, that was a good challenge 😀

    Also, I was wondering, if these are actually multiple files, why not use some kind of an archive file format, and use their libraries to read them?

    Hope it helps.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a collection where all form data is stored, each form with a
I have data stored in three columns of Excel Column A: Serial Number Column
I have data stored as below in an MS Access database: Date User 20090101
I currently have my data stored in this format : Word1 - Word2 -
I have stored data in hidden <input> tags, but it caused reduction in performance.
I have some data that is stored in a TIMESTAMP(6) WITH TIMEZONE column in
I have objects with location data stored in Core Data, I would like to
I have a database which contains picture data stored as a binary blob. The
I have the following scenario: I have various user's data stored in my database.
I have a data structure where an entity has times stored as an int

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.