I’m using python to convert the words in sentences in a text file to individual tokens in a list for the purpose of counting up word frequencies. I’m having trouble converting the different sentences into a single list. Here’s what I do:
f = open('music.txt', 'r')
sent = [word.lower().split() for word in f]
That gives me the following list:
[['party', 'rock', 'is', 'in', 'the', 'house', 'tonight'],
['everybody', 'just', 'have', 'a', 'good', 'time'],...]
Since the sentences in the file were in separate lines, it returns this list of lists and defaultdict can’t identify the individual tokens to count up.
It tried the following list comprehension to isolate the tokens in the different lists and return them to a single list, but it returns an empty list instead:
sent2 = [[w for w in word] for word in sent]
Is there a way to do this using list comprehensions? Or perhaps another easier way?
Just use a nested loop inside the list comprehension:
There are some alternatives to this approach, for example using
itertools.chain.from_iterable(), but I think the nested loop is much easier in this case.