Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8487899
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T21:22:52+00:00 2026-06-10T21:22:52+00:00

I’m downloading zipped files containing XMLs, and I’d like to avoid writing the zip

  • 0

I’m downloading zipped files containing XMLs, and I’d like to avoid writing the zip files to disk before manipulating them because of latency requirements. However, java.util.zip doesn’t suffice for me. There’s no way to say “here’s a byte array of a zip file, use it” without turning it into a stream, and ZipInputStream is not reliable, since it scans for entry headers (see discussion below EDIT for reasons why that is not reliable).

I do not yet have access to the zip files I’ll be handling, so I don’t know whether I’ll be able to handle them through the ZipInputStream, and I need to find a solution that will work with any valid ZIP files, as the penalty for a failure once I go into production will be high.

Assuming ZipInputStream won’t work, what can I do to solve this problem in cases where there are no entry headers? I’m using Wikipedia’s definition, which includes a comment on how to correctly uncompress zip files (quoted below), as the standard.

EDIT

The Apache Commons Zip library has a good write up on some of the problems using Stream (both their solution and Java’s) has. I’ll further add, from wikipedia and personal experience, and the size and crc field on entry headers may not be filled (I’ve files with -1 in these fields). Thanks to centic for providing this link.

Also, let me quote the wikipedia on the subject:

Tools that correctly read zip archives must scan for the signatures of
the various fields, the zip central directory. They must not scan for
entries because only the directory specifies where a file chunk
starts. Scanning could lead to false positives, as the format doesn’t
forbid other data to be between chunks, or uncompressed stream
containing such signatures.

Note that ZipInputStream scans for entries, not the central directory, which is the problem with it.

Final Edit

If anyone is interested, this script can be used to produce a valid ZIP file that cannot be read by ZipInputStream from an existing ZIP file. So, as a final edit to this closed question, I needed a library that can read files such as the ones produced by this script.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T21:22:53+00:00Added an answer on June 10, 2026 at 9:22 pm

    EDIT: Another suggestion…

    Looking at ZipFile from the Apache Commons implementation, it looks like it wouldn’t be too hard to effectively fork that for your project. Create a wrapper around your byte array which has all the pieces of the RandomAccessFile API which are required (I don’t think there are very many). You’ve already indicated that you prefer the interface to ZipFile, so why not go with that?

    We don’t know enough about your project to know whether this opens up any legal questions – and even if you gave details, I doubt that anyone here would be able to give good legal advice – but I suspect it wouldn’t take more than an hour or two to get this solution up and working, and I suspect you’d have reasonable confidence in it.


    EDIT: This may be a slightly more productive answer…

    If you’re worried about the entries not being contiguous, but don’t want to handle all the compression side yourself, you might consider an option where you effectively rewrite the data. Create a new ByteArrayOutputStream, and read the central directory at the end. For each entry in the central directory, write out an entry (header + data) to the output stream in a format that you believe ZipInputStream will be happy with. Then write a new central directory – if you want your replacement to be valid you may need to do this from scratch, but if you’re using code which you know won’t actually read the central directory, you could just provide the original one, ignoring the fact that it might not then be valid. So long as it starts with the right signature, that’s probably good enough 🙂

    Once you’ve done that, convert the ByteArrayOutputStream into a new byte[], wrap it in a ByteArrayInputStream and then pass that to ZipInputStream or ZipArchiveInputStream.

    Depending on your purposes, you may not even need to do that much – you may be able to just extract each file as you go by creating a “mini” zip file with just the one entry you’re reading from the directory at a time.

    This does involve understanding the zip file format, but not completely – just the skeleton, effectively. It’s not a quick and easy fix like using an existing API completely, but it shouldn’t take very long. It doesn’t guarantee it’ll be able to read all invalid files (how could it?) but it will protect you against the “data between entries” issue you seem to be particularly concerned about. Hope it’s at least a useful idea…


    there’s no way to say “here’s a byte array of a zip file, use it”

    Yes there is:

    byte[] data = ...;
    ByteArrayInputStream byteStream = new ByteArrayInputStream(data);
    ZipInputStream zipStream = new ZipInputStream(byteStream);
    

    That leaves the issue of whether ZipInputStream can handle all the zip files you’ll give it – but I wouldn’t write it off quite so quickly.

    Of course, there are other APIs available. You may want to look at Apache Commons Compress, for example. Even though ZipFile requires a file, ZipArchiveInputStream doesn’t – so again, you could use a ByteArrayInputStream. EDIT: It looks like ZipArchiveStream doesn’t read from the central directory either. I was hoping it would use markSupported to check beforehand, but it appears not to…

    EDIT: In the comments on the question, I asked where you’d read that the zip file doesn’t have to contain entry data. You quoted wikipedia:

    “Tools that correctly read zip archives must scan for the signatures of the various fields, the zip central directory. They must not scan for entries because only the directory specifies where a file chunk starts. Scanning could lead to false positives, as the format doesn’t forbid other data to be between chunks, or uncompressed stream containing such signatures.”

    That’s not the same as entry data being optional. It’s saying that there may be extra data in awkward places, not that the entries may be missing completely. It’s basically saying that the entries shouldn’t be assumed to be contiguous. I could happily concede that ZipInputStream may not be reading the central directory at the end of the file, but finding code which does that isn’t the same as finding code which copes with entry data not existing.

    You then write:

    I might further add that whether the zip is valid or not is not my concern. Working with it is.

    … which suggests you want code which will handle invalid zip files. Combined with this:

    I do not yet have access to the zip files I’ll be handling, so I don’t know whether I’ll be able to handle them through the stream

    That means you’re asking for code which should handle zip files which are invalid in ways you can’t even predict. Just how invalid would it have to be for you to be able to reject it? If I give you 1000 random bytes, with no attempt for them to be a zip file at all, what on earth would you do with it?

    Basically, you need to pin the problem down more tightly before it’s feasible to even say whether a particular library is a valid solution. It’s reasonable to collect a set of zip files from various places, which may be invalid in well-understood ways, and say “I must be able to support all of these.” Later you may need to do some work if it turns out that wasn’t good enough. But to be able to support anything, however broken, simply isn’t a valid requirement.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
link Im having trouble converting the html entites into html characters, (&# 8217;) i
For some reason, after submitting a string like this Jack’s Spindle from a text
I've got a string that has curly quotes in it. I'd like to replace
I am trying to render a haml file in a javascript response like so:
I would like to run a str_replace or preg_replace which looks for certain words
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
I would like to count the length of a string with PHP. The string
I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
I have thousands of HTML files to process using Groovy/Java and I need to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.