Hi I am trying to scrape the UFC events schedule with Simple HTML DOM Parser.
I am struggling to select the right data.
i want the Title, Image, Date, Time & Location.
So far i have tried
function scraping_ufc() {
// create HTML DOM
$html = file_get_html('http://uk.ufc.com/schedule/event/');
// get news block
foreach($html->find('table tr') as $event) {
// get title
$item['title'] = trim($event->find('div[class="event-tagline"]', 0)->innertext);
// get details
$item['date'] = trim($event->find('div[class="date"]', 0)->innertext);
$item['time'] = trim($event->find('div[class="time"]', 0)->innertext);
$ret[] = $item;
}
// clean up memory
$html->clear();
unset($html);
return $ret;
}
Alot of not needed table rows are selected, i do manage to get the title but not the date or time.
Please help me select the data i need efficiently.
First of all, stop using simple html dom because it’s less reliable than the built-in dom library. It was useful some years ago but these days it really just causes more problems than it solves.
Next, you need a better way to identify the rows you want.
table trwill select every tr on the page and you don’t want that. It would be nice if the tr’s were styled but they’re not so I came up with this: