I need to write some code (in any language) to process 10,000 files that

Question

0

Asked: May 31, 20262026-05-31T19:22:43+00:00 2026-05-31T19:22:43+00:00

I need to write some code (in any language) to process 10,000 files that

0

I need to write some code (in any language) to process 10,000 files that reside on a local Linux filesystem. Each file is ~500KB in size, and consists of fixed-size records of 4KB each.

The processing time per record is negligible, and the records can be processed in any order, both within and across different files.

A naïve implementation would read the files one by one, in some arbitrary order. However, since my disks are very fast to read but slow to seek, this will almost certainly produce code that’s bound by disk seeks.

Is there any way to code the reading up so that it’s bound by disk throughput rather than seek time?

One line of inquiry is to try and get an approximate idea of where the files reside on disk, and use that to sequence the reads. However, I am not sure what API could be used to do that.

I am of course open to any other ideas.

The filesystem is ext4, but that’s negotiable.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T19:22:45+00:00

Editorial Team

2026-05-31T19:22:45+00:00Added an answer on May 31, 2026 at 7:22 pm

Perhaps you could do the reads by scheduling all of them in quick succession with aio_read. That would put all reads in the filesystem read queue at once, and then the filesystem implementation is free to complete the reads in a way that minimizes seeks.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to write some code (in any language) to process 10,000 files that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply