I would like to read a CSV file from the standard input and process each row as it comes. My CSV outputting code writes rows one by one, but my reader waits the stream to be terminated before iterating the rows. Is this a limitation of csv module? Am I doing something wrong?
My reader code:
import csv
import sys
import time
reader = csv.reader(sys.stdin)
for row in reader:
print "Read: (%s) %r" % (time.time(), row)
My writer code:
import csv
import sys
import time
writer = csv.writer(sys.stdout)
for i in range(8):
writer.writerow(["R%d" % i, "$" * (i+1)])
sys.stdout.flush()
time.sleep(0.5)
Output of python test_writer.py | python test_reader.py:
Read: (1309597426.3) ['R0', '$']
Read: (1309597426.3) ['R1', '$$']
Read: (1309597426.3) ['R2', '$$$']
Read: (1309597426.3) ['R3', '$$$$']
Read: (1309597426.3) ['R4', '$$$$$']
Read: (1309597426.3) ['R5', '$$$$$$']
Read: (1309597426.3) ['R6', '$$$$$$$']
Read: (1309597426.3) ['R7', '$$$$$$$$']
As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.
As it says in the documentation,
And you can see by looking at the implementation of the
csvmodule (line 784) thatcsv.readercalls thenext()method of the underlyling iterator (viaPyIter_Next).So if you really want unbuffered reading of CSV files, you need to convert the file object (here
sys.stdin) into an iterator whosenext()method actually callsreadline()instead. This can easily be done using the two-argument form of theiterfunction. So change the code intest_reader.pyto something like this:For example,
Can you explain why you need unbuffered reading of CSV files? There might be a better solution to whatever it is you are trying to do.