Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3238686
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T17:52:58+00:00 2026-05-17T17:52:58+00:00

I am pulling data from a bzip2 stream within a C application. As chunks

  • 0

I am pulling data from a bzip2 stream within a C application. As chunks of data come out of the decompressor, they can be written to stdout:

fwrite(buffer, 1, length, stdout);

This works great. I get all the data when it is sent to stdout.

Instead of writing to stdout, I would like to process the output from this statement internally in one-line-chunks: a string that is terminated with a newline character \n.

Do I write the output of the decompressor stream to another buffer, one character at a time, until I hit a newline, and then call the per-line processing function? Is this slow and is there a smarter approach? Thanks for your advice.

EDIT

Thanks for your suggestions. I ended up creating a pair of buffers that store the remainder (the “stub” at the end of an output buffer) at the beginning of a short line buffer, each time I pass through the output buffer’s worth of data.

I loop through the output buffer character by character and process a newline-line’s worth of data at a time. The newline-less remainder gets allocated and assigned, and copied to the next stream’s line buffer. It seems like realloc is less expensive than repeated malloc-free statements.

Here’s the code I came up with:

char bzBuf[BZBUFMAXLEN];
BZFILE *bzFp;
int bzError, bzNBuf;
char bzLineBuf[BZLINEBUFMAXLEN];
char *bzBufRemainder = NULL;
int bzBufPosition, bzLineBufPosition;

bzFp = BZ2_bzReadOpen(&bzError, *fp, 0, 0, NULL, 0); /* http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html#bzcompress-init */ 

if (bzError != BZ_OK) {
    BZ2_bzReadClose(&bzError, bzFp);   
    fprintf(stderr, "\n\t[gchr2] - Error: Bzip2 data could not be retrieved\n\n");
    return -1;          
}

bzError = BZ_OK;
bzLineBufPosition = 0;
while (bzError == BZ_OK) {

    bzNBuf = BZ2_bzRead(&bzError, bzFp, bzBuf, sizeof(bzBuf));

    if (bzError == BZ_OK || bzError == BZ_STREAM_END) {
        if (bzBufRemainder != NULL) {
            /* fprintf(stderr, "copying bzBufRemainder to bzLineBuf...\n"); */
            strncpy(bzLineBuf, bzBufRemainder, strlen(bzBufRemainder)); /* leave out \0 */
            bzLineBufPosition = strlen(bzBufRemainder);
        }

        for (bzBufPosition = 0; bzBufPosition < bzNBuf; bzBufPosition++) {
            bzLineBuf[bzLineBufPosition++] = bzBuf[bzBufPosition];
            if (bzBuf[bzBufPosition] == '\n') {
                bzLineBuf[bzLineBufPosition] = '\0'; /* terminate bzLineBuf */

                /* process the line buffer, e.g. print it out or transform it, etc. */
                fprintf(stdout, "%s", bzLineBuf);

                bzLineBufPosition = 0; /* reset line buffer position */
            }
            else if (bzBufPosition == (bzNBuf - 1)) {
                bzLineBuf[bzLineBufPosition] = '\0';
                if (bzBufRemainder != NULL)
                    bzBufRemainder = (char *)realloc(bzBufRemainder, bzLineBufPosition);
                else
                    bzBufRemainder = (char *)malloc(bzLineBufPosition);
                strncpy(bzBufRemainder, bzLineBuf, bzLineBufPosition);
            }
        }
    }
}

if (bzError != BZ_STREAM_END) {
    BZ2_bzReadClose(&bzError, bzFp);
    fprintf(stderr, "\n\t[gchr2] - Error: Bzip2 data could not be uncompressed\n\n");
    return -1;  
} else {   
    BZ2_bzReadGetUnused(&bzError, bzFp, 0, 0);
    BZ2_bzReadClose(&bzError, bzFp);
}

free(bzBufRemainder);
bzBufRemainder = NULL;

I really appreciate everyone’s help. This is working nicely.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T17:52:58+00:00Added an answer on May 17, 2026 at 5:52 pm

    This would be easy to do using C++’s std::string, but in C it takes some code if you want to do it efficiently (unless you use a dynamic string library).

    char *bz_read_line(BZFILE *input)
    {
        size_t offset = 0;
        size_t len = CHUNK;  // arbitrary
        char *output = (char *)xmalloc(len);
        int bzerror;
    
        while (BZ2_bzRead(&bzerror, input, output + offset, 1) == 1) {
            if (offset+1 == len) {
                len += CHUNK;
                output = xrealloc(output, len);
            }
            if (output[offset] == '\n')
                break;
            offset++;
        }
    
        if (output[offset] == '\n')
            output[offset] = '\0';  // strip trailing newline
        else if (bzerror != BZ_STREAM_END) {
            free(output);
            return NULL;
        }
    
        return output;
    }
    

    (Where xmalloc and xrealloc handle errors internally. Don’t forget to free the returned string.)

    This is almost an order of magnitude slower than bzcat:

    lars@zygmunt:/tmp$ wc foo
     1193  5841 42868 foo
    lars@zygmunt:/tmp$ bzip2 foo
    lars@zygmunt:/tmp$ time bzcat foo.bz2 > /dev/null
    
    real    0m0.010s
    user    0m0.008s
    sys     0m0.000s
    lars@zygmunt:/tmp$ time ./a.out < foo.bz2 > /dev/null
    
    real    0m0.093s
    user    0m0.044s
    sys     0m0.020s
    

    Decide for yourself whether that’s acceptable.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have the following function that is pulling data from a database. The ajax
My code is pulling from a data cells that lists multiple file paths and
I am pulling data out of an old-school ActiveX in the form of arrays
I'm pulling back a Date and a Time from a database. They are stored
Im pulling data from a MySql data table. I'm pulling from a row called
I am pulling data from one table, called analyzedCopy, and using it to over-rite
So I am pulling data from a SQL Server 2000 DB then converting it
I am pulling data from a database with Ajax and dynamically populating a div
I am pulling data from an RSS Feed. One of the keys in the
I am pulling data from a database that uses ascii character 254 as a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.