I’m a beginning programmer trying to make a simple application that just scrapes a website and returns values.
I’m trying to do something I thought would be simple, but after searching and trying, have given up to just ask.
With my scraper, I return three variables: $title1, $title2, and $title3. All of the $title’s come from different methods of me trying to find the name of the article. Ideally, I’d just have to look for one and be done, but some websites store data differently (some through meta tags, hidden divs, elements, etc).
I need a way to do the following pseudo code:
if $title1, $title2, $title3 != null { // don't count a string if it is null
$title1_stringlength = string_length($title1) //find string length of the $titles
$title2_stringlength = string_length($title2)
$title3_stringlength = string_length($title3)
$realtitle = $lowestvalueofstringlength; // $realtitle gets whichever $title is shortest in length, not counting any null $title's
}
Here’s an example of why I need to do this:
echo $title1; //echoes "Exercise Daily"
echo $title2; //echoes "null"
echo $title3; //echoes "Exercise Daily - And More advice on SaveTheTwinkie.org"
$realtitle = $title1;//should be $title1 because it was shortest that wasn't null
//or a different example from another site
echo $title1; //echoes "Wow look at this Article Title!"
echo $title2; //echoes "null"
echo $title3; //echoes "Wow look at this Article Title! - from StupidArticles.tv"
$realtitle = $title1;//should be $title1 because it was shortest that wasn't null
So my code would look for the shortest $title in string length (that wasn’t null) and give the value to $realtitle.
Thanks for any and all help! If you need more details, just ask!
EDIT
heres my complete code: It works until one of the $title’s is “”, then $realtitle becomes “” as well
<?php
$sites_html = file_get_contents($url);
$html = new DOMDocument();
@$html->loadHTML($sites_html);
$title1 = null; //reset
$title2 = null; //reset
$title3 = null; //reset
//Get all meta tags and loop through them.
foreach($html->getElementsByTagName('meta') as $meta) {
if($meta->getAttribute('property')=='og:title'){
//Assign the value from content attribute to $title1
$title1 = $meta->getAttribute('content');
}
}
foreach($html->getElementsByTagName('h1') as $div) {
if($div->getAttribute('itemprop')=='name'){
$title2 = $div->nodeValue;
}
}
foreach($html->getElementsByTagName('h1') as $div) {
if($div->getAttribute('class')=='fn'){
$title3 = $div->nodeValue;
}
}
$realtitle = array_reduce(array($title2, $title1, $title3), function($a, $b) {
return strlen($a) && $a != 'null' && strlen($a) < strlen($b) ? $a : $b;
}, null);
echo 'metaogtitle: '.$title1 . '<br/><br/><br/><br/><br/>';
echo 'name: '.$title2. '<br/><br/><br/><br/><br/>';
echo 'name2: '.$title3. '<br/><br/><br/><br/><br/>';
echo 'realtitle: '.$realtitle. '<br/><br/><br/><br/><br/>';
?>
1 Answer