Just a quick beginners question here with NLTK.
I am trying to tokenize and generate bigrams, trigrams and quadgrams from a corpus.
I need to add <s> to the beginning of my lists and </s> to the end in place of a period if there is one.
The list is taken from the brown corpus in nltk. (and a specific section at that)
so.. I have
from nltk.corpus import brown
news = brown.sents(categories = 'editorial')
Am I making this too difficult?
Thanks a lot.
yields