I have created a matlab program to find word bigrams and their frequencies in a text file. For this purpose I have created a cell array of strings using textread function:
unigrams = textread(‘file.txt’,’%s’);
But I also wish to omit a bunch of words like ‘to’, ‘the’, ‘is’, ‘or’, etc and special characters ‘#’, ‘$’, ‘&’ and ‘%’ from my cell array. Is there a way to exclude these words while reading the words from the raw file.
Thanks.
You can use
setdiffafter reading the text to remove the unwanted words: