This is question about Objective-C. I wrote the program that uses regular expression with getting whole HTML. I have uploaded the program to GitHub. However, exception occurs.
The purpose of this program is to get the “og:image” by regular expression match. This is the image which is displayed by writing URL in Facebook. To set this image, you write in HTML as below:
<meta property="og:image"
content="http://business.nikkeibp.co.jp/article/NBD/20120727/235043/zu1.jpg">
So I wrote the program which get whole HTML and find og:image part. The code is below:
// Web page address
NSURL *url = [NSURL URLWithString:textField.text];
// Get the web page HTML
NSString *string =
[NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:nil];
// prepare regular expression to find text
NSError *error = nil;
NSRegularExpression *regexp =
[NSRegularExpression regularExpressionWithPattern:
@"<meta property=\"og:image\" content=\".+\""
options:0
error:&error];
@try {
// find by regular expression
NSTextCheckingResult *match =
[regexp firstMatchInString:string options:0 range:NSMakeRange(0, string.length)];
// get the first result
NSRange resultRange = [match rangeAtIndex:0];
NSLog(@"match=%@", [string substringWithRange:resultRange]);
if (match) {
// get the og:image URL from the find result
NSRange urlRange = NSMakeRange(resultRange.location + 35, resultRange.length - 35 - 1);
NSURL *urlOgImage = [NSURL URLWithString:[string substringWithRange:urlRange]];
imageView.image = [UIImage imageWithData:[NSData dataWithContentsOfURL:urlOgImage]];
}
}
The whole code is in GitHub as below:
https://github.com/weed/p120728_GetOgImage/blob/master/GetOgImage/ViewController.m
However, sometimes this program through exception.
-
success case:http://www.nicovideo.jp/watch/1343369790
-
failure case:http://business.nikkeibp.co.jp/article/NBD/20120727/235043/?ST=pc
Screen shots is here: https://github.com/weed/p120728_GetOgImage/blob/master/readme.md
Why exception occurs? Please teach me. Thank you for your help.
My friend kindly pointed about considering Character Encoding. The character encoding of first URL page is UTF-8, and the second one is EUC-JP.
With the code below I could get the og:image of second URL I showed above.
I made the check library of character encoding named
NSString+Encode. The whole code is in GitHub:https://github.com/weed/p120728_OgImageLibrary