I am trying to process various texts by regex and NLTK of python -which is at http://www.nltk.org/book-. I am trying to create a random text generator and I am having a hard time with a problem. First, here is my algorithm:
-
Enter a sentence as input -this is called trigger string-
-
Get longest word in trigger string
-
Search all Project Gutenberg database for sentences that contain this word -regardless of uppercase lowercase-
-
Return the longest sentence that has the word I spoke about in step 3
-
Append the sentence in Step 1 and Step4 together
-
Repeat the process. Note that I have to get the longest word in second sentence and continue like that and so on-
So far I have been able to do this for first two sentences but I cannot perform a case insensitive search. Entire sentence database of Project Gutenberg is available via gutenberg.sents() function but regex – case insensitive search is practically impossible since the gutenberg.sents() outputs the sentences in books as following -in a list of list format-:
EXAMPLE: all the sentences of shakespeare’s macbeth is called by typing
import nltk
from nltk.corpus import gutenberg
gutenberg.sents('shakespeare-macbeth.txt')
into the python shell command line and output is:
[['[', 'The', 'Tragedie', 'of', 'Macbeth', 'by', 'William', 'Shakespeare', '1603', ']'],
['Actus', 'Primus', '.'], .......]
with [The Tragedie of Macbeth by William Shakespare, 1603] and Actus Primus. being the first two sentences.
How can I find the word I’m looking for regardless of it being uppercase/lowercase ? I’m desperately in need of help since I have been tinkering with this for the past two days and it’s starting to wear on my nerves. Thanks a lot.
Given a list
Lof words, and a target wordt,tells you whether L has word t in a case-insensitive way. It’s faster, of course, to do
since Python does not “hoist” the constant computation out of the loop and, unless you hoist it yourself, it will be performed repeatedly.
Given a list of lists
lol, the longest sub-list includingtcan be found byIf multiple sub-lists include
tand are of the same maximal length, this will give you the first one, as it happens.