I use Simple HTML DOM to scrape a page for the latest news, and

Question

0

Editorial Team

Asked: May 11, 20262026-05-11T05:42:31+00:00 2026-05-11T05:42:31+00:00

I use Simple HTML DOM to scrape a page for the latest news, and

0

I use Simple HTML DOM to scrape a page for the latest news, and then generate an RSS feed using this PHP class.

This what I have now:

<?php   // This is a minimum example of using the class  include('FeedWriter.php');  include('simple_html_dom.php');   $html = file_get_html('http://www.website.com');  foreach($html->find('td[width='380'] p table') as $article) { $item['title'] = $article->find('span.title', 0)->innertext; $item['description'] = $article->find('.ingress', 0)->innertext; $item['link'] = $article->find('.lesMer', 0)->href;      $item['pubDate'] = $article->find('span.presseDato', 0)->plaintext;      $articles[] = $item; }   //Creating an instance of FeedWriter class.  $TestFeed = new FeedWriter(RSS2);    //Use wrapper functions for common channel elements   $TestFeed->setTitle('Testing & Checking the RSS writer class');  $TestFeed->setLink('http://www.ajaxray.com/projects/rss');  $TestFeed->setDescription('This is test of creating a RSS 2.0 feed Universal Feed Writer');    //Image title and link must match with the 'title' and 'link' channel elements for valid RSS 2.0    $TestFeed->setImage('Testing the RSS writer class','http://www.ajaxray.com/projects/rss','http://www.rightbrainsolution.com/images/logo.gif');   foreach($articles as $row) {      //Create an empty FeedItem     $newItem = $TestFeed->createNewItem();      //Add elements to the feed item         $newItem->setTitle($row['title']);     $newItem->setLink($row['link']);     $newItem->setDate($row['pubDate']);     $newItem->setDescription($row['description']);      //Now add the feed item     $TestFeed->addItem($newItem); }    //OK. Everything is done. Now genarate the feed.   $TestFeed->genarateFeed();  ?>

How can I make this code simpler? Right know there is two foreach statements, how can I combine them?

Because the news scraped is in Norwegian, I need to apply the html_entity_decode() on the title. I’ve tried It here, but I couldn’t get it to work:

foreach($html->find('td[width='380'] p table') as $article) { $item['title'] = html_entity_decode($article->find('span.title', 0)->innertext, ENT_NOQUOTES, 'UTF-8'); $item['description'] = '<img src='' . $article->find('img[width='100']', 0)->src . ''><p>' . $article->find('.ingress', 0)->innertext . '</p>';     $item['link'] = $article->find('.lesMer', 0)->href;      $item['pubDate'] = unix2rssdate(strtotime($article->find('span.presseDato', 0)->plaintext)); $articles[] = $item; }

Thanks 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T05:42:31+00:00

Well for just a simple combination of the two loops you could create the feed as your parse through the HTML:

<?php include('FeedWriter.php'); include('simple_html_dom.php');  $html = file_get_html('http://www.website.com');  //Creating an instance of FeedWriter class.  $TestFeed = new FeedWriter(RSS2); $TestFeed->setTitle('Testing & Checking the RSS writer class'); $TestFeed->setLink('http://www.ajaxray.com/projects/rss'); $TestFeed->setDescription(   'This is test of creating a RSS 2.0 feed Universal Feed Writer');  $TestFeed->setImage('Testing the RSS writer class',                     'http://www.ajaxray.com/projects/rss',                     'http://www.rightbrainsolution.com/images/logo.gif');  //parse through the HTML and build up the RSS feed as we go along foreach($html->find('td[width='380'] p table') as $article) {   //Create an empty FeedItem   $newItem = $TestFeed->createNewItem();    //Look up and add elements to the feed item      $newItem->setTitle($article->find('span.title', 0)->innertext);   $newItem->setDescription($article->find('.ingress', 0)->innertext);   $newItem->setLink($article->find('.lesMer', 0)->href);        $newItem->setDate($article->find('span.presseDato', 0)->plaintext);         //Now add the feed item   $TestFeed->addItem($newItem); }  $TestFeed->genarateFeed(); ?>

What’s the issue you’re seeing with html_entity_decode, if you give us a link to a page it doesn’t work on that might help?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I use Simple HTML DOM to scrape a page for the latest news, and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply