I have been trying to use Regular Expression to separate full sentences in a big block of text. I can’t use the componentsSeparatedByCharactersInSet because it will obviously fail with sentences ending in ?!, !!, … I have seen some external classes to do componentSeparateByRegEx but I prefer doing it without adding an external library.
Here is a sample input
Hi, I am testing. How are you? Wow!! this is the best, and I am happy.
The output should be an array
first element: Hi, I am testing.
second element: How are you?
third element: wow!!
forth element: this is the best, and I am happy.
This is what I have but as I mentioned it shouldn’t do what I intend. Probably a regular expression will do a much better job here.
-(NSArray *)getArrayOfFullSentencesFromBlockOfText:(NSString *)textBlock{
NSMutableCharacterSet *characterSet = [[NSMutableCharacterSet alloc] init];
[characterSet addCharactersInString:@".?!"];
NSArray * sentenceArray = [textBlock componentsSeparatedByCharactersInSet:characterSet];
return sentenceArray;
}
Thanks for your help,
You want to use
-[NSString enumerateSubstringsInRange:options:usingBlock:]with theNSStringEnumerationBySentencesoption. This will give you every sentence, and it does so in a language-aware manner.Note, in testing, each substring appears to contain the trailing spaces after the punctuation. You may want to strip those out.