I’m writing a Python generator which looks like “cat”. My specific use case is for a “grep like” operation. I want it to be able to break out of the generator if a condition is met:
summary={}
for fn in cat("filelist.dat"):
for line in cat(fn):
if line.startswith("FOO"):
summary[fn] = line
break
So when break happens, I need the cat() generator to finish and close the file handle to fn.
I have to read 100k files with 30 GB of total data, and the FOO keyword happens in the header region, so it is important in this case that the cat() function stops reading the file ASAP.
There are other ways I can solve this problem, but I’m still interested to know how to get an early exit from a generator which has open file handles. Perhaps Python cleans them up right away and closes them when the generator is garbage collected?
Thanks,
Ian
By implementing the context protocol and the iterator protocol in the same object, you can write pretty sweet code like this:
This is a sample implementation, tested with Python 2.5 on a Linux box. It reads the lines of
/etc/passwduntil it finds the one for useraudio, and then stops:Or even simpler:
File objects implement the iterator protocol (see http://docs.python.org/library/stdtypes.html#file-objects)