I am trying to execute a command like this: find ./ -name *.gz -print

Question

0

Asked: May 31, 20262026-05-31T18:39:13+00:00 2026-05-31T18:39:13+00:00

I am trying to execute a command like this: find ./ -name *.gz -print

0

I am trying to execute a command like this:

find ./ -name "*.gz" -print -exec ./extract.sh {} \;

The gz files themselves are small. Currently my extract.sh contains the following:

# Start delimiter
echo "#####" $1 >> Info
zcat $1 > temp
# Series of greps to extract some useful information
grep -o -P "..." temp >> Info
grep -o -P "..." temp >> Info
rm temp
echo "####" >> Info

Obviously, this is not parallelizable because if I run multiple extract.sh instances, they all write to the same file. What is a smart way of doing this?

I have 80K gz files on a machine with massive horse power of 32 cores.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T18:39:14+00:00

I would create a temporary directory. Then create an output file for each grep (based on the name of te file it processed). Files created under /tmp are located on a RAM disk and so will not thrash your harddrive with lots of writes.

You can then either cat it all together at the end, or get each grep to signal another process when it has finished and that process can begin catting files immediately (and removing them when done).

Example:

working_dir="`pwd`"
temp_dir="`mktemp -d`"
cd "$temp_dir"
find "$working_dir" -name "*.gz" | xargs -P 32 -n 1 extract.sh 
cat *.output > "$working_dir/Info"
rm -rf "$temp_dir"

extract.sh

 filename=$(basename $1)
 output="$filename.output"
 extracted="$filename.extracted"
 zcat "$1" > "$extracted"

 echo "#####" $filename > "$output"
 # Series of greps to extract some useful information
 grep -o -P "..." "$extracted" >> "$output"
 grep -o -P "..." "$extracted" >> "$output"
 rm "$extracted"
 echo "####" >> "$output"

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to execute a command like this: find ./ -name *.gz -print

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply