I got a UITextView with an arbitrary length text (up to 10000 characters). I need to parse this text, extract all keywords and list them by the frequency of use with the most frequently used word being on top, next one down, etc. I will most likely present a modal UITableView after the operation is completed.
I’m thinking of an efficient and useful way to do this. I can try to separate a string using a delimiter in the form of [whitespace, punctuation marks, etc].
This gets me an array of character sequences.
I can add each add sequence as an NSMutableDictionary key, and increment its count once I see another instance of that word. However, this may result in a list of 300-400 words, most having frequency of 1.
Is there a good way to implement the logic that I’m describing? Should I try to sort the array in alphabetical order and try some kind of “fuzzy” logic match? Are there any NSDataDetector or NSString methods that can do this kind of work for me?
An additional question is: how would I extract stuff like a, at, to, for, etc, and do not list them in my keyword list?
It would be great if I can take a look at a sample project that has already accomplished this task.
Thank you!
I ended up going with the
CFStringTokenizer. I’m not sure if the bridged casts below are correct, but it seems to work