I understand that nltk can split sentences and print it out using the following code.
but how do i put the sentences into a list instead of outputing onto the screen?
import nltk.data
from nltk.tokenize import sent_tokenize
import os, sys, re, glob
cwd = './extract_en' #os.getcwd()
for infile in glob.glob(os.path.join(cwd, 'fileX.txt')):
(PATH, FILENAME) = os.path.split(infile)
read = open(infile)
for line in read:
sent_tokenize(line)
the sent_tokenize(line) prints it out. how do i put it into a list?
Here’s a simplified version that I used to test the code:
When called like so, it prints the following:
When doing something like this, a list comprehension is more concise and IMO more pleasant to read:
To clarify, the above returns a list of lists of sentences, one list of sentences for each line. If you want a flat list of sentences, do this instead, as eyquem suggests: