Here’s the code I’m running.
Basically I scrape data, and place them into simple POCO classes. At the end of the loop I want to add the $newItem object to the $parsedItems array. I’m new to PHP, can this be a scoping issue?
<h1>Scraper Noticias</h1>
<?php
include('simple_html_dom.php');
class News {
var $image;
var $fechanoticia;
var $title;
var $description;
var $sourceurl;
function get_image( ) {
return $this->image;
}
function set_image ($new_image) {
$this->image = $new_image;
}
function get_fechanoticia( ) {
return $this->fechanoticia;
}
function set_fechanoticia ($new_fechanoticia) {
$this->fechanoticia = $new_fechanoticia;
}
function get_title( ) {
return $this->title;
}
function set_title ($new_title) {
$this->title = $new_title;
}
function get_description( ) {
return $this->description;
}
function set_description ($new_description) {
$this->description = $new_description;
}
function get_sourceurl( ) {
return $this->sourceurl;
}
function set_sourceurl ($new_sourceurl) {
$this->sourceurl = $new_sourceurl;
}
}
// Create DOM from URL or file
$initialPage = file_get_html('http://www.uvm.cl/noticias_mas.shtml');
// Declare variable to hold all parsed news items.
$parsedNews = array();
// Since the University blog page has 262 pages, we'll iterate through that.
for ($i = 2; $i <= 5; $i++) {
$url = "http://www.uvm.cl/noticias_mas.shtml?AA_SL_Session=34499aef1fc7a296fb666dcc7b9d8d05&scrl=1&scr_scr_Go=" . $i;
$page = file_get_html($url);
parse_page_for_news($page);
}
echo "<h1>Final Count:" . count($parsedNews) . "</h1>";
// Function receives an HTML Dom object, and the library works against that single HTML object.
function parse_page_for_news ($page) {
foreach($page->find('#cont2 p') as $element) {
$newItem = new News;
// Parse the news item's thumbnail image.
foreach ($element->find('img') as $image) {
$newItem->set_image($image->src);
//echo $newItem->get_image() . "<br />";
}
// Parse the news item's post date.
foreach ($element->find('span.fechanoticia') as $fecha) {
$newItem->set_fechanoticia($fecha->innertext);
//echo $newItem->get_fechanoticia() . "<br />";
}
// Parse the news item's title.
foreach ($element->find('a') as $title) {
$newItem->set_title($title->innertext);
//echo $newItem->get_title() . "<br />";
}
// Parse the news item's source URL link.
foreach ($element->find('a') as $sourceurl) {
$newItem->set_sourceurl("http://www.uvm.cl/" . $sourceurl->href);
}
// Parse the news items' description text.
foreach ($element->find('a') as $link) {
$link->outertext = '';
}
foreach ($element->find('span') as $link) {
$link->outertext = '';
}
foreach ($element->find('img') as $link) {
$link->outertext = '';
}
$newItem->set_description($element->innertext);
// Add the newly formed NewsItem to the $parsedNews object.
$parsedNews[] = $newItem;
print_r($newItem);
echo "<br /><br /><br />";
}
}
?>
In my current understanding of the language, since the $parsedItems object is declared outside of the function, shouldn’t it correctly be added?
Why would my count() call return 0, as if it had no objects in it?
Though you could just add
inside you function declaration. I would think it better coding practice to pass the item to the function by reference if you need to be able to modify it and have the modified value reflected in global scope. So you could simply change you function signature to this