I apologize if this question has been answered already, but I cannot seem to find a page that describes this process. What I am trying to do is to take a large file (The new york times corpus), change it to a list of words using the split function, and then search through that long list for certain words. I have been able to get python to print the file with this code
words=open('nyt.txt')
for line in words:
print (line)
but I would like to be able to use words.split() on this function afterward.
So far, I have been developing the program using a small corpus that I just type in like this
words= ('A B. C D E F G A. B C D E F G A B C D E F G A B C D E F G')
but, rather than copying and pasting the nyt into the parentheses (this doesn’t work, the file is too large). I would rather have it source the file into the variable name.
Once again, I am sorry if this has been asked and answered before, as is likely.
What you probably want is called a generator. In your case, it could look like this:
This processes the file line by line, so it doesn’t have to read the entire file into memory at once. The
yieldkeyword turns the function result into a generator. Usage:Edit: If I understand you correctly this time round, you just want to read all words into a list? Easy enough: