I’m trying to read a file in python (scan it lines and look for terms) and write the results- let say, counters for each term. I need to do that for a big amount of files (more than 3000). Is it possible to do that multi threaded? If yes, how?
So, the scenario is like this:
- Read each file and scan its lines
- Write counters to same output file for all the files I’ve read.
Second question is, does it improve the speed of read/write.
Hope it is clear enough. Thanks,
Ron.
I agree with @aix,
multiprocessingis definitely the way to go. Regardless you will be i/o bound — you can only read so fast, no matter how many parallel processes you have running. But there can easily be some speedup.Consider the following (input/ is a directory that contains several .txt files from Project Gutenberg).
When I run this on my dual core machine there is a noticeable (but not 2x) speedup:
If the files are small enough to fit in memory, and you have lots of processing to be done that isn’t i/o bound, then you should see even better improvement.