Current Process: I have a tar.gz file. (Actually, I have about 2000 of them,

Question

0

Editorial Team

Asked: June 5, 20262026-06-05T21:54:48+00:00 2026-06-05T21:54:48+00:00

Current Process: I have a tar.gz file. (Actually, I have about 2000 of them,

0

Current Process:

I have a tar.gz file. (Actually, I have about 2000 of them, but that’s another story).
I make a temporary directory, extract the tar.gz file, revealing 100,000 tiny files (around 600 bytes each).
For each file, I cat it into a processing program, pipe that loop into another analysis program, and save the result.

The temporary space on the machines I’m using can barely handle one of these processes at once, never mind the 16 (hyperthreaded dual quad core) that they get sent by default.
I’m looking for a way to do this process without saving to disk. I believe the performance penalty for individually pulling files using tar -xf $file -O <targetname> would be prohibitive, but it might be what I’m stuck with.

Is there any way of doing this?

EDIT: Since two people have already made this mistake, I’m going to clarify:

Each file represents one point in time.
Each file is processed separately.
Once processed (in this case a variant on Fourier analysis), each gives one line of output.
This output can be combined to do things like autocorrelation across time.

EDIT2: Actual code:

for f in posns/*; do
    ~/data_analysis/intermediate_scattering_function < "$f"
done | ~/data_analysis/complex_autocorrelation.awk limit=1000 > inter_autocorr.txt

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T21:54:49+00:00

Editorial Team

2026-06-05T21:54:49+00:00Added an answer on June 5, 2026 at 9:54 pm

This sounds like a case where the right tool for the job is probably not a shell script. Python has a tarfile module which can operate in streaming mode, letting you make only a single pass through the large archive and process its files, while still being able to distinguish the individual files (which the tar --to-stdout approach will not).

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Current Process: I have a tar.gz file. (Actually, I have about 2000 of them,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply