I downloaded this code from Ryan Nystrom’s github; it’s an Objective C port of the PHP Text Statistics project at Dave Child’s github. There are things that I don’t recognize as Objective C, but I’m a newbie programmer, so I wanted to run it by stackoverflow to see whether I was making some obvious mistake before I got in touch with the programmer.
My issue is that there’s an NSDictionary of words that are exceptions in the syllable-counting method, but when I use it to count syllables in a text that includes those words, they aren’t counted as exceptions. For example, the dictionary contains the word “twelve” and indicates that it should be counted as one syllable, but when I analyze the word “twelve” it comes up as two syllables.
Now there’s also a list of endings/patterns that need to be counted as one syllable that might otherwise be counted as two (-cious, -cial, etc.). When I add “twelve” to that list, it gets counted as one syllable. So that list seems to be functioning fine. It’s just the dictionary of exceptions that doesn’t seem to be working.
Am I missing something incredibly obvious? Or is this a get-in-touch-with-the-coder-and-let-him-know situation?
Thanks in advance for the help.
- (NSInteger)syllableCount {
if ([self isEqualToString:@""]) {
return 0;
}
// remove non-alpha chars
NSString *strippedString = [self stringByReplacingRegularExpression:@"[^A-Za-z]" withString:@"" options:kNilOptions];
// use lowercase for brevity w/ options + patterns
NSString *lowercase = [strippedString lowercaseString];
// altered in enumerate blocks
__block NSInteger syllableCount = 0;
//***It's this dictionary whose items seem not to be registering as exceptions:
// special rules that don't follow syllable matching patterns
NSDictionary *exceptions = @{
@"you" : @1,
@"simile" : @3,
@"forever" : @3,
@"shoreline" : @2,
@"poetry" : @3,
@"twelve" : @1,
@"delete" : @2,
};
// if one of the preceding words, return special case value
NSNumber *caught = exceptions[self];
if (caught) {
return caught.integerValue;
}
//***If I put those words in the appropriate places in the following lists, however, they end up being counted correctly.
// These syllables would be counted as two but should be one
NSArray *subSyllables = @[
@"cial",
//...various other things...
@"[aeiouy]rse$",
];
// These syllables would be counted as one but should be two
NSArray *addSyllables = @[
@"ia",
//...various other things...
@"ie(r|st)$"
];
// Single syllable prefixes and suffixes
NSArray *prefixSuffix = @[
@"^un",
//...various other things...
@"ings?$",
];
// remove prefix & suffix, count how many are removed
NSInteger prefixesSuffixesCount = 0;
NSString *strippedPrefixesSuffixes = [NSRegularExpression stringByReplacingOccurenceOfPatterns:prefixSuffix inString:lowercase options:kNilOptions withTemplate:@"" count:&prefixesSuffixesCount];
// removed non-word chars from word
NSString *strippedNonWord = [strippedPrefixesSuffixes stringByReplacingRegularExpression:@"[^a-z]" withString:@"" options:kNilOptions];
NSString *nonVowelPattern = @"[aeiouy]+";
NSError *vowelError = nil;
NSRegularExpression *nonVowelRegex = [[NSRegularExpression alloc] initWithPattern:nonVowelPattern options:kNilOptions error:&vowelError];
NSArray *wordPartsResults = [nonVowelRegex matchesInString:strippedNonWord options:kNilOptions range:NSMakeRange(0, [strippedNonWord length])];
NSMutableArray *wordParts = [NSMutableArray array];
[wordPartsResults enumerateObjectsUsingBlock:^(NSTextCheckingResult *match, NSUInteger idx, BOOL *stop) {
NSString *substr = [strippedNonWord substringWithRange:match.range];
if (substr) {
[wordParts addObject:substr];
}
}];
__block NSInteger wordPartCount = 0;
[wordParts enumerateObjectsUsingBlock:^(NSString *part, NSUInteger idx, BOOL *stop) {
if (! [part isEqualToString:@""]) {
wordPartCount++;
}
}];
syllableCount = wordPartCount + prefixesSuffixesCount;
// Some syllables do not follow normal rules - check for them
[subSyllables enumerateObjectsUsingBlock:^(NSString *subSyllable, NSUInteger idx, BOOL *stop) {
NSError *error = nil;
NSRegularExpression *regex = [[NSRegularExpression alloc] initWithPattern:subSyllable options:kNilOptions error:&error];
syllableCount -= [regex numberOfMatchesInString:strippedNonWord options:kNilOptions range:NSMakeRange(0, [strippedNonWord length])];
}];
[addSyllables enumerateObjectsUsingBlock:^(NSString *addSyllable, NSUInteger idx, BOOL *stop) {
NSError *error = nil;
NSRegularExpression *regex = [[NSRegularExpression alloc] initWithPattern:addSyllable options:kNilOptions error:&error];
syllableCount += [regex numberOfMatchesInString:strippedNonWord options:kNilOptions range:NSMakeRange(0, [strippedNonWord length])];
}];
syllableCount = syllableCount <= 0 ? 1 : syllableCount;
return syllableCount;
}
The rest of the method is using the processed form of the string — that is, after stripping whitespace and lowercasing — but that
exceptionsdictionary lookup is using the orignal form, so unless your string is exactly@"twelve", and not@"Twelve",@" twelve ",@"twelve\t", it won’t be found in there.Fix:
You should probably submit this as a bug to the author.