I have a huge file and need to read it and process.
with open(source_filename) as source, open(target_filename) as target:
for line in source:
target.write(do_something(line))
do_something_else()
Can this be accelerated with threads? If I spawn a thread per line, will this have a huge overhead cost?
edit: To make this question not a discussion, How should the code look like?
with open(source_filename) as source, open(target_filename) as target:
?
@Nicoretti: In an iteration I need to read a line of several KB of data.
update 2: the file may be a bz2, so Python may have to wait for unpacking:
$ bzip2 -d country.osm.bz2 | ./my_script.py
You could use three threads: for reading, processing and writing. The possible advantage is that the processing can take place while waiting for I/O, but you need to take some timings yourself to see if there is an actual benefit in your situation.
It is a good idea to set a
maxsizefor the queues to prevent ever-increasing memory consumption. The value of 1000 is an arbitrary choice on my part.