I run across this problem frequently suppose I have a text file that I have read in as as a list using file.readlines()
suppose the file looks something like this
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff #indeterminate number of line \
The text I want is set off by something distinctive
I want this
I want this
I want this
I want this # indeterminate number of lines
The end is also identifiable by something distinctive
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff
stuff stuff stuff stuff stuff
The way I have been handling this is to do something like this
themasterlist=[]
for file in filelist:
count=0
templist=[]
for line in file:
if line=='The text I want is set off by something distinctive':
count=1
if line=='The end is also identifiable by something distinctive':
count=0
if count==1:
templist.append(line)
themasterlist.append(templist)
I have thought about using the string (file.read()) and splitting it based on the end points and then converting it to a list but actually I want to use this construction for a number of other types. For example, suppose I am iterating through the elements of an lxml.fromstring(somefile) and I want to process a subset of the elements based on whether or not the element.text contains some phrase etc.
Note, I could be running through 200K to 300K files at a time.
My solution works but it feels clunky and like I am missing something important about python
There are three really good answers and I learned something useful from each. I need to select one as the answer but I do appreciate the response of each poster it was very helpful
I like stuff like this:
The stuff you were missing is yield and list comprehensions – here is your code revised: