Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 67713
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 10, 20262026-05-10T19:18:11+00:00 2026-05-10T19:18:11+00:00

Previously, I asked the question . The problem is the demands of our file

  • 0

Previously, I asked the question.

The problem is the demands of our file structure are very high.

For instance, we’re trying to create a container with up to 4500 files and 500mb data.

The file structure of this container consists of

  • SQLite DB (under 1mb)
  • Text based xml-like file
  • Images inside a dynamic folder structure that make up the rest of the 4,500ish files

  • After the initial creation the images files are read only with the exception of deletion.

  • The small db is used regularly when the container is accessed.

Tar, Zip and the likes are all too slow (even with 0 compression). Slow is subjective I know, but to untar a container of this size is over 20 seconds.

Any thoughts?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-10T19:18:11+00:00Added an answer on May 10, 2026 at 7:18 pm

    Three things.

    1) What Timothy Walters said is right on, I’ll go in to more detail.

    2) 4500 files and 500Mb of data is simply a lot of data and disk writes. If you’re operating on the entire dataset, it’s going to be slow. Just I/O truth.

    3) As others have mentioned, there’s no detail on the use case.

    If we assume a read only, random access scenario, then what Timothy says is pretty much dead on, and implementation is straightforward.

    In a nutshell, here is what you do.

    You concatenate all of the files in to a single blob. While you are concatenating them, you track their filename, the file length, and the offset that the file starts within the blob. You write that information out in to a block of data, sorted by name. We’ll call this the Table of Contents, or TOC block.

    Next, then, you concatenate the two files together. In the simple case, you have the TOC block first, then the data block.

    When you wish to get data from this format, search the TOC for the file name, grab the offset from the begining of the data block, add in the TOC block size, and read FILE_LENGTH bytes of data. Simple.

    If you want to be clever, you can put the TOC at the END of the blob file. Then, append at the very end, the offset to the start of the TOC. Then you lseek to the end of the file, back up 4 or 8 bytes (depending on your number size), take THAT value and lseek even farther back to the start of your TOC. Then you’re back to square one. You do this so you don’t have to rebuild the archive twice at the beginning.

    If you lay out your TOC in blocks (say 1K byte in size), then you can easily perform a binary search on the TOC. Simply fill each block with the File information entries, and when you run out of room, write a marker, pad with zeroes and advance to the next block. To do the binary search, you already know the size of the TOC, start in the middle, read the first file name, and go from there. Soon, you’ll find the block, and then you read in the block and scan it for the file. This makes it efficient for reading without having the entire TOC in RAM. The other benefit is that the blocking requires less disk activity than a chained scheme like TAR (where you have to crawl the archive to find something).

    I suggest you pad the files to block sizes as well, disks like work with regular sized blocks of data, this isn’t difficult either.

    Updating this without rebuilding the entire thing is difficult. If you want an updatable container system, then you may as well look in to some of the simpler file system designs, because that’s what you’re really looking for in that case.

    As for portability, I suggest you store your binary numbers in network order, as most standard libraries have routines to handle those details for you.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a problem similar to this question which was previously asked but my
The question is related to a question I've previously asked https://stackoverflow.com/questions/13158412/message-broker-sending-email The problem is
I have previously solved a similar problem in this question , where I asked
Previously I asked a question here . That problem solved but there are error
I am working on this problem and had previously asked related question. Implementation of
This is in reference to the question previously asked The problem here is, each
This problem is similar to my previously asked question. When I query data using
I previously asked a question to resolve the SecurityNegotiationException while trying to access a
I previously asked a question about fetching the last 100 mentions for a person
I previously asked a question about chaining conditions in Linq To Entities. Now I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.