I’ve just started working with nltk and am having trouble getting the concordance module to work with conditional variables. I would like to return a concordance for any given word in a Latin text, but since the language is inflected, I want to be able to specify the stem, identify any word in the corpus that contains the stem, and return a concordance for that. The code I’m using is:
book1 = open('Book1.txt', 'rU').read()
token1 = nltk.word_tokenize(book1)
text1 = nltk.Text(token1)
word = raw_input("What stem do you want to search?\n > ")
text1.concordance([w for w in text1 if w.startswith(word)])
Which returns the error:
Traceback (most recent call last):
File "C:\Users\admin\Desktop\start_nltk_horace.py", line 68, in <module>
concordance()
File "C:\Users\admin\Desktop\start_nltk_horace.py", line 49, in concordance
text1.concordance([w for w in text1 if w.startswith(word)])
File "C:\Python27\lib\site-packages\nltk\text.py", line 314, in concordance
self._concordance_index.print_concordance(word, width, lines)
File "C:\Python27\lib\site-packages\nltk\text.py", line 177, in print_concordance
offsets = self.offsets(word)
File "C:\Python27\lib\site-packages\nltk\text.py", line 156, in offsets
word = self._key(word)
File "C:\Python27\lib\site-packages\nltk\text.py", line 312, in <lambda>
key=lambda s:s.lower())
AttributeError: 'list' object has no attribute 'lower'
Just specifying text1.concordance(word) returns what I’m looking for without any issues (providing I input the fully-declined word), but I would have to repeat the function six-ish times to get a concordance for all of the different declensions of a word.
I think the problem is that you’re trying to supply NLTK’s
concordance()function with a list of words, when it only accepts a string. Try the following instead:Then,
my_concordancesshould end up as a list where each entry is a concordance for a different word that started with the raw input string. You can also consider pre-allocating the space formy_concordancesdepending on what specific data type gets returned by theconcordance()function, since you can just check the length ofmy_inputs. That might improve speed if it’s an issue.Note that this question might be of interest to you too. It goes into more detail on
concordance().