I have some processing I want to do on thousands of files simultaneously. Grab

Question

0

Asked: May 26, 20262026-05-26T13:12:42+00:00 2026-05-26T13:12:42+00:00

I have some processing I want to do on thousands of files simultaneously. Grab

0

I have some processing I want to do on thousands of files simultaneously. Grab the first byte of all the files and do something, go to the next byte, etc. The files could be any size, so loading them all into memory could be prohibitive.

I’m concerned that due to limitations in operating system file descriptors, just naively opening thousands of files and reading them in seems like I might run into issues.

But cycling through and opening/closing files would be rather inefficient, I imagine.

Is there some efficient mechanism to handle what I’m trying to do?

NOTE: this function may be distributed to use machines that I would have no control over, so I can’t just go changing settings on the OS.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T13:12:43+00:00

I want to do on thousands of files simultaneously. Grab the first byte of all the files and do something, go to the next byte, etc.

Are these files small enough that you could read them all into memory at once. If so, then read the files one at a time, then process all the files a byte at a time.

I’m concerned that due to limitations in operating system file descriptors, just naively opening thousands of files and reading them in seems like I might run into issues.

You might. The only way to find out is to try.

But cycling through and opening/closing files would be rather inefficient, I imagine.

Yes it would. But if you can’t read all the files into memory, and your operating system can’t open thousands of files at a time, then this is your last resort.

What you can do is find out the limit of simultaneous open files that your system can handle. Let’s just say for the sake of discussion that your system can open 100 files at a time, and you have 2,500 files to process.

Then your process would look something like this.

Open the first 100 files.
Write an output file that contains the first byte from the 100 files, then the second byte from the 100 files, and so on.
Handle any problems you might encounter if the 100 files are not of the same byte length.

Now, after running this process through all your files, you’ll have 25 intermediate files.

Then your second process would look something like this.

Open the 25 intermediate files.
Process the first 100 bytes from each file.

You would determine the actual numbers (simultaneous files open, number of intermediate files) through experimentation or research on your operating system.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have some processing I want to do on thousands of files simultaneously. Grab

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply