I am new to Python. I have 2000 files each about 100 MB. I

Question

0

Asked: May 26, 20262026-05-26T22:30:16+00:00 2026-05-26T22:30:16+00:00

I am new to Python. I have 2000 files each about 100 MB. I

0

I am new to Python. I have 2000 files each about 100 MB. I have to read each of them and merge them into a big matrix (or table). Can I use parallel processing for this so that I can save some time? If yes, how? I tried searching and things seem very complicated. Currently, it takes about 8 hours to get this done serially. We have a really big server with one Tera Byte RAM and few hundred processors. How can I efficiently make use of this?

Thank you for your help.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T22:30:16+00:00

Editorial Team

2026-05-26T22:30:16+00:00Added an answer on May 26, 2026 at 10:30 pm

You make be able to preprocess the files in separate processes using the subprocess module; however, if the final table is kept in memory, then that process will end up being you bottleneck.

There is another possible approach using shared memory with mmap objects. Each subprocess can be responsible for loading the files into a subsection of the mapped memory.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am new to Python. I have 2000 files each about 100 MB. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply