Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6172857
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T23:27:56+00:00 2026-05-23T23:27:56+00:00

I have a large (3GB), gzipped file containing two fields: NAME and STRING. I

  • 0

I have a large (3GB), gzipped file containing two fields: NAME and STRING. I want to split this file into smaller files – if field one is john_smith, I want the string to be placed in john_smith.gz. NOTE: the string field can and does contain special characters.

I can do this easily in a for loop over the domains using BASH, but I’d much prefer the efficiency of reading the file in once using AWK.

I have tried using the system function within awk with escaped single quotes around the string

zcat large_file.gz | awk ‘{system(“echo -e ‘”‘”‘”$1″\t”$2″‘”‘”‘ | gzip >> “$1″.gz”);}’

and it works perfectly on most of the lines, however some of them are printed to STDERR and give an error that the shell cannot execute a command (the shell thinks that part of the string is a command). It looks like special characters might be breaking it.

Any thoughts on how to fix this, or any alternate implementations that would help?

Thanks!

-Sean

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T23:27:57+00:00Added an answer on May 23, 2026 at 11:27 pm

    This little perl script does the job nicely

    • keeping all destination files open for performance
    • doing error elementary handling
    • Edit now also pipes output through gzip on the fly

    There is a bit of a kludge with $fh because apparently using the hash entry directly doesn’t work

    #!/usr/bin/perl
    use strict;
    use warnings;
    
    my $suffix = ".txt.gz";
    
    my %pipes;
    while (my ($id, $line) = split /\t/,(<>),2)
    {
        exists $pipes{$id} 
            or open ($pipes{$id}, "|gzip -9 > '$id$suffix'") 
            or die "can't open/create $id$suffix, or cannot spawn gzip";
    
        my $fh = $pipes{$id};
        print $fh $line;
    }
    
    print STDERR "Created: " . join(', ', map { "$_$suffix" } keys %pipes) . "\n"
    

    Oh, use it like

    zcat input.gz | ./myscript.pl
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a large (3Gb) binary file of doubles which I access (more or
I have a very large data file (2GB-3GB). I need to parse some data
I have large String variable named text . I want to be able to
I have large sqlite Dabase file I read this post I am not able
Say I have two large (text) files which are allegedly identical, but I want
I have large string which I split by newlines. How can I remove all
I have done database optimization for dbs upto 3GB size. Need a really large
I have large table. consisting of only 3 columns (id(INT),bookmarkID(INT),tagID(INT)).I have two BTREE indexes
I have large file comprising ~100,000 lines. Each line corresponds to a cluster and
I have a large file in windows XP - its 38GB. (a VM image)

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.