I make a simple application to take recipe info from websites like allrecipes.com. I’m using preg_match, but something is not working.
$geturl = file_get_contents("http://allrecipes.com/Recipe/Brown-Sugar-Smokies/Detail.aspx?src=rotd");
preg_match('#<title>(.*) - Allrecipes.com</title>#', $geturl, $match);
$name = $match[1];
echo $name;
I’m just trying to take the title of the page (minus the - Allrecipes.com part) and put it into a variable, but all that turns up is blank.
There were two problems in this pattern. First, there was a newline symbol after the
<title>which wasn’t captured by.(as without/smodifier.is literally ‘any symbol but EOL one’). Second, theAllrecipes.comtext was actually NOT followed by</title>substring, there was a newline separating them.Taking into account the fact that
\scovers both normal whitespace and line separating one, you can just alter your regex like this:/smodifier is not actually relevant here (cudos to minitech for noticing that), as the title in this recipe is one-line, and all “\n” symbols will be covered by\s*subexpression. But I’d still suggest leaving it there, so that multi-line titles won’t catch you off-guard.I’ve replaced
.*with.*?for efficiency sake here: as the string you’re looking for is quite short, it makes sense to use non-greedy quantifier here.