We know the following code is loading the data line-by-line only rather than loading them all in memory. i.e. the line alread read will be somehow marked ‘deletable’ for the OS
def fileGen( file ):
for line in file:
yield line
with open("somefile") as file:
for line in fileGen( file ):
print line
but is there anyway we could verify if this is still true if we modify the definition of fileGen to following?
def fileGen( file ):
for line in csv.Reader( file ):
yield line
How we could know if csv.Reader will cache the data it loaded? thanks
regards,
John
The most reliable way to find out what
csv.readeris doing is to read the source. See_csv.c, lines 773 onwards. You’ll see that the reader object has a pointer to the underlying iterator (typically a file iterator), and it callsPyIter_Nexteach time it needs another line. So it does not read ahead or otherwise cache the data it loads.Another way to find out what
csv.readeris doing is to make a mock file object that can report when it is being queried. For example:This confirms what we learned from reading the
csvsource code: it only requests the next line from the underlying iterator when its ownnextmethod is called.John made it clear (see comments) that his concern is whether
csv.readerkeeps the lines alive, preventing them from being collected by Python’s memory manager.Again, you can either read the code (most reliable) or try an experiment. If you look at the implementation of
Reader_iternextin_csv.c, you’ll see thatlineobjis the name given to the object returned by the underlying iterator, and there’s a call toPy_DECREF(lineobj)on every path through the code. Socsv.readerdoes not keeplineobjalive.Here’s an experiment to confirm that.
So you can see that
csv.readerdoes not hang on to the objects it gets from its iterator, and if nothing else is keeping them alive, then they get garbage-collected in a timely fashion.I have a feeling that there’s something more to this question that you’re not telling us. Can you explain why you are worried about this?