I have a large (3GB), gzipped file containing two fields: NAME and STRING. I want to split this file into smaller files – if field one is john_smith, I want the string to be placed in john_smith.gz. NOTE: the string field can and does contain special characters.
I can do this easily in a for loop over the domains using BASH, but I’d much prefer the efficiency of reading the file in once using AWK.
I have tried using the system function within awk with escaped single quotes around the string
zcat large_file.gz | awk ‘{system(“echo -e ‘”‘”‘”$1″\t”$2″‘”‘”‘ | gzip >> “$1″.gz”);}’
and it works perfectly on most of the lines, however some of them are printed to STDERR and give an error that the shell cannot execute a command (the shell thinks that part of the string is a command). It looks like special characters might be breaking it.
Any thoughts on how to fix this, or any alternate implementations that would help?
Thanks!
-Sean
This little perl script does the job nicely
gzipon the flyThere is a bit of a kludge with
$fhbecause apparently using the hash entry directly doesn’t workOh, use it like