Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 524371
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T08:33:40+00:00 2026-05-13T08:33:40+00:00

I have some .gz compressed files which is around 5-7gig uncompressed. These are flatfiles.

  • 0

I have some .gz compressed files which is around 5-7gig uncompressed.
These are flatfiles.

I’ve written a program that takes a uncompressed file, and reads it line per line, which works perfectly.

Now I want to be able to open the compressed files inmemory and run my little program.

I’ve looked into zlib but I can’t find a good solution.

Loading the entire file is impossible using gzread(gzFile,void *,unsigned), because of the 32bit unsigned int limitation.

I’ve tried gzgets, but this almost doubles the execution time, vs reading in using gzread.(I tested on a 2gig sample.)

I’ve also looked into “buffering”, such as splitting the gzread process into multiple 2gig chunks, find the last newline using strcchr, and then setting the gzseek.
But gzseek will emulate a total file uncompression. which is very slow.

I fail to see any sane solution to this problem.
I could always do some checking, whether or not a current line actually has a newline (should only occure in the last partially read line), and then read more data from the point in the program where this occurs.
But this could get very ugly.

Does anyhow have any suggestions?

thanks

edit:
I dont need to have the entire file at once,just need one line a time, but I got a fairly huge machine, so if that was the easiest I would have no problems.

For all those that suggest piping the stdin, I’ve experienced extreme slowdowns compared to opening the file. Here is a small code snippet I made some months ago, that illustrates it.

time ./a.out 59846/59846.txt
#       59846/59846.txt
18255221

real    0m4.321s
user    0m2.884s
sys     0m1.424s
time ./a.out <59846/59846.txt
18255221

real    1m56.544s
user    1m55.043s
sys     0m1.512s

And the source code

#include <iostream>
#include <fstream>
#define LENS 10000

int main(int argc, char **argv){
  std::istream *pFile;

  if(argc==2)//ifargument supplied
    pFile = new std::ifstream(argv[1],std::ios::in);
  else //if we want to use stdin
    pFile = &std::cin;

  char line[LENS];
  if(argc==2) //if we are using a filename, print it.
    printf("#\t%s\n",argv[1]);

  if(!pFile){
    printf("Do you have permission to open file?\n");
    return 0;
  }

  int numRow=0;
  while(!pFile->eof()) {
    numRow++;
    pFile->getline(line,LENS);
  }
  if(argc==2)
    delete pFile;
  printf("%d\n",numRow);
  return 0;
}  

thanks for your replies, I’m still waiting the golden apple

edit2:
using the cstyle FILE pointers instead of c++ streams is much much faster. So I think this is the way to go.

Thank for all your input

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T08:33:40+00:00Added an answer on May 13, 2026 at 8:33 am

    gzip -cd compressed.gz | yourprogram

    just go ahead and read it line by line from stdin as it is uncompressed.

    EDIT: Response to your remarks about performance. You’re saying reading STDIN line by line is slow compared to reading an uncompressed file directly. The difference lies within terms of buffering. Normally pipe will yield to STDIN as soon as the output becomes available (no, or very small buffering there). You can do “buffered block reads” from STDIN and parse the read blocks yourself to gain performance.

    You can achieve the same result with possibly better performance by using gzread() as well. (Read a big chunk, parse the chunk, read the next chunk, repeat)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have some files which need reading using Access / VBA. They are compressed
I have some python code that: Takes a BLOB from a database which is
I have some GZ compressed resources in my program and I need to be
I have a program that reads and writes very large text files. However, because
I have a directory tree in which there are some files and some subdirectories.
Background I have a large TIFF file that is compressed with JPEG (new, compression
I have several possible files which could hold my data; they can be compressed
First some background. We have a vendor application which generates logs and configuration files
I have a Python program which is going to take text files as input.
I have 20 GB of (uncompressed) log files. They're initially compressed though (as one

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.