Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6319481
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T15:48:47+00:00 2026-05-24T15:48:47+00:00

I have PHP scrapping script which fetches HTML table content from another website. The

  • 0

I have PHP scrapping script which fetches HTML table content from another website. The script doesn’t fetch HTML special characters (tags) which cause the content to look unformatted.

How can I modify the following code to fetch HTML special characters, including all tags?

Complete Code:

<?php
error_reporting(E_ERROR);
set_time_limit(0);

function createRSSFile($tag,$value,$data)
{
    # this will return the each element with tag.
    $tag=strtolower(str_replace(" ","_",$tag));
    $tag=strtolower(str_replace(":","",$tag));
    $tag=strtolower(str_replace("&","and",$tag));
    //  $returnITEM = "<".$tag.">".htmlspecialchars(str_replace(" 00:00:00","",$value))."</".$tag.">";
    $returnITEM = "<".$tag.">".htmlspecialchars(str_replace("â¢","<br/><br/> ",$value))."</".$tag.">";
    return $returnITEM;
} 

// function extraFields($data){
//print_r($data);

//  $returnITEM = "<".strtolower(str_replace(" ","_",$data[18][0])).">".htmlspecialchars($data[18][1])."</".strtolower(str_replace(" ","_",$data[18][0])).">";
//  $returnITEM = "<".strtolower(str_replace("&","or",$data[19][0])).">".htmlspecialchars($data[19][1])."</".strtolower(str_replace("&","or",$data[19][0])).">";
//  $returnITEM .= "<".strtolower(str_replace(" ","_",$data[20][0])).">".htmlspecialchars($data[20][1])."</".strtolower(str_replace(" ","_",$data[20][0])).">";
//  $returnITEM .= "<".strtolower(str_replace(" ","_",$data[22][0])).">".htmlspecialchars($data[23][0])."</".strtolower(str_replace(" ","_",$data[22][0])).">";
//  $returnITEM .= "<".strtolower(str_replace(" ","_",$data[24][0])).">".htmlspecialchars($data[25][0])."</".strtolower(str_replace(" ","_",$data[24][0])).">";
//  $returnITEM .= "<".strtolower(str_replace(" ","_",$data[26][0])).">".htmlspecialchars($data[26][1])."</".strtolower(str_replace(" ","_",$data[26][0])).">";
//  preg_match('/[a-z0-9]+([_\\.-][a-z0-9]+)*@([a-z0-9]+([\.-][a-z0-9]+)*)+\\.[a-z]{2,}/i',$data[25][0],$email);
//  $email=$email[0];
//  $returnITEM .= "<email>".$email."</email>";
//  return $returnITEM;

// }

function fileRead(){
    $filename = "count.txt";
    $handle = fopen($filename, "r");
    $contents = fread($handle, filesize($filename));
    fclose($handle);
    return $contents;
}

function fileWrite ($val) {
    $filename = 'count.txt';
    $somecontent = $val;

    if (is_writable($filename)) {
        if (!$handle = fopen($filename, 'w')) {
            echo "Cannot open file ($filename)";
            exit;
        }
        if (fwrite($handle, $somecontent) === FALSE) {
            echo "Cannot write to file ($filename)";
            exit;
        }
        fclose($handle);
    } else {
        echo "The file $filename is not writable";
    } 
}

function fetchData($jobid) {
    $html=file_get_contents('http://acbar.org/JobDetail.aspx?id='.$jobid);
    $html=str_replace("<td></td>", "",$html);
    $html=str_replace("<td style=\"font-size:8pt;font-weight:bold;\"></td>","<td style=\"font-size:8pt;font-weight:bold;\">Null</td>",$html);
    $html=str_replace("<td style=\"font-size:8pt;font-weight:bold;\" colspan=\"2\" ></td>","<td style=\"font-size:8pt;font-weight:bold;\" colspan=\"2\" >Null</td>",$html);

    $html=str_replace("&nbsp;", " ",$html);
    $html=str_replace("", "<br>",$html);
    $html=str_replace("<br>", "_br_",$html);
    // $html=str_replace("\â\u","'",$html);


    $dom = new DOMDocument;
    $dom->loadHTML( $html );

   //echo $dom->saveHTML();
   //exit;
    $rows = array();
    foreach( $dom->getElementsByTagName( 'tr' ) as $tr ) {
        $cells = array();
        foreach( $tr->getElementsByTagName( 'td' ) as $td ) {
            if(trim($td->nodeValue)!='')
                $cells[] = str_replace("br","<br>",trim($td->nodeValue));
            }
            if(sizeof($cells)>0)
                $rows[] = $cells;
        }
        for($i=0;$i<0;$i++) 
            array_shift ($rows);
        // echo "<pre>"; print_r($rows); echo "</pre>";
        // exit;

        if($rows[0][1]=="")
            return false;  
        else
            return $rows;  
    }

// Lets build the page
$latestBuild = date("r");

// Lets define the the type of doc we're creating.
$createXML ="<?xml version=\"1.0\" encoding=\"UTF-8\" ?>";
$createXML .= "<rss version=\"0.92\">";
$createXML .= "<channel>
    <title>Job List</title>
    <link>http://acbar.org</link>
    <description>Job List</description>
    <lastBuildDate>$latestBuild</lastBuildDate>
    <language>en</language>";
 $startFrom=fileRead();
 $startFrom=$startFrom+1;
 $endWith=$startFrom+3;


for($jid=$startFrom;$jid<$endWith;$jid++) {
    $data=fetchData($jid);

    if(!$data) 
        break;

    $srcurl='http://acbar.org/JobDetail.aspx?id='.$jid;

    $createXML .= '<item><sourceurl>'.htmlspecialchars($srcurl).'</sourceurl>';
    for($i=0;$i<23;$i++) 
    {
        $tag=$data[$i][0];
        $value=$data[$i][1];
        $createXML .= createRSSFile($tag,$value,$data);
    }
    // $extra=extraFields($data);
    // $createXML .= $extra;
    $createXML .= "</item>";
    // fileWrite($jid);
}
// preg_match('/[a-z0-9]+([_\\.-][a-z0-9]+)*@([a-z0-9]+([\.-][a-z0-9]+)*)+\\.[a-z]{2,}/i',$data[26][1],$email);
// $email=$email[0];

header("content-type: text/xml");
echo $createXML .= "</channel></rss>";

?> 
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T15:48:48+00:00Added an answer on May 24, 2026 at 3:48 pm

    this works for me …

    function fetchData($jobid) {
        $html=file_get_contents('http://acbar.org/JobDetail.aspx?id='.$jobid);
        $html=str_replace("<td></td>", "",$html);
        $html=str_replace("&nbsp;", " ",$html);
        $html=str_replace("<br/>", "_br_",$html);
    
        $dom = new DOMDocument;
        $html = mb_convert_encoding($html, "HTML-ENTITIES", "UTF-8"); 
        $dom->loadHTML($html);
        $rows = array();
        foreach( $dom->getElementsByTagName( 'tr' ) as $tr ) {
            $cells = array();
            foreach( $tr->getElementsByTagName( 'td' ) as $td ) {
                if(trim($td->nodeValue)!='')
                    $cells[] = htmlspecialchars(trim($td->nodeValue));
            }
            if(sizeof($cells)>0)
                $rows[] = $cells;
        }
    
        // this will return the each element with tag.
        foreach($rows as $ntag){
            $tag = strtolower(str_replace(" ","_",$ntag[0]));
            $tag = strtolower(str_replace(":","",$tag));
            $tag = strtolower(str_replace("&","and",$tag));
            $returnITEM .= "<".$tag.">".str_replace('_br_', '<br />', htmlspecialchars(str_replace(" 00:00:00","",$ntag[1])))."</".$tag.">";
        }
    
        return $returnITEM;
    } 
    
    echo fetchData(3350);
    

    EDIT … added $html = mb_convert_encoding($html, "HTML-ENTITIES", "UTF-8");

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am using PHP and Mysql I have PHP script in which I rollback
I have cron job - php script which is called one time in 5
I have a PHP scraper script which I use to scrape a page on
I have a PHP CLI script ( http://codepad.org/w6iyLLdv ) which stubbornly returns exit code
I have php script that creates a temporary watermark image for users that are
I have PHP scrip that goes like this: if ($cost_frm < $cost){ echo <script
With the help from two previous questions, I now have a working HTML scraper
I have an index.php, which at it's very start includes db.php - file which
Specifically I have a PHP command-line script that at a certain point requires input
I am parsing an HTML page with DOM and XPath in PHP. I have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.