Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6966763
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T16:14:00+00:00 2026-05-27T16:14:00+00:00

My script is a spider that checks if a page is a links page

  • 0

My script is a spider that checks if a page is a “links page” or is a “information page”.
if the page is a “links page” then it continue in a recursive manner (or a tree if you will)
until it finds the “information page”.

I tried to make the script recursive and it was easy but i kept getting the error:

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to
allocate 39 bytes) in /srv/www/loko/simple_html_dom.php on line 1316

I was told i would have to use the for loop method because no matter if i use the unset() function the script won’t free memory and i only have three levels i need to loop through so it makes sense. But after i changed the script the error occurs again, but maybe i can free
memory now?

Something needs to die here, please help me destruct someone!

set_time_limit(0);
ini_set('memory_limit', '256M');
require("simple_html_dom.php");
$thelink = "http://www.somelink.com";
$html1 = file_get_html($thelink);
$ret1 = $html1->find('#idTabResults2');

// first inception level, we know page has only links
if (!$ret1){
    $es1 = $html1->find('table.litab a');
    //unset($html1);
    $countlinks1 = 0;
    foreach ($es1 as $aa1) {
        $links1[$countlinks1] = $aa1->href;
        $countlinks1++;
    }
    //unset($es1);

    //for every link in array do the same
    for ($i = 0; $i < $countlinks1; $i++) {
        $html2 = file_get_html($links1[$i]);
        $ret2 = $html2->find('#idTabResults2');
        // if got information then send to DB
        if ($ret2){
            pullInfo($html2);
            //unset($html2);
        } else {
        // continue inception
            $es2 = $html2->find('table.litab a');
            $html2 = null;

            $countlinks2 = 0;
            foreach ($es2 as $aa2) {
            $links2[$countlinks2] = $aa2->href;
            $countlinks2++;
            }
            //unset($es2);

            for ($j = 0; $j < $countlinks2; $j++) {
                $html3 = file_get_html($links2[$j]);
                $ret3 = $html3->find('#idTabResults2');
                // if got information then send to DB       
                if ($ret3){
                    pullInfo($html3);

                } else {
                // inception level three
                    $es3 = $html3->find('table.litab a');
                    $html3 = null;
                    $countlinks3 = 0;
                    foreach ($es3 as $aa3) {
                        $links3[$countlinks3] = $aa3->href;
                        $countlinks3++;
                    }
                    for ($k = 0; $k < $countlinks3; $k++) {
                        echo memory_get_usage() ;
                        echo "\n";
                        $html4 = file_get_html($links3[$k]);
                        $ret4 = $html4->find('#idTabResults2');
                        // if got information then send to DB       
                        if ($ret4){
                            pullInfo($html4);

                        }
                        unset($html4);                  
                    }
                    unset($html3);
                }

            }
        }
    }
}



function pullInfo($html)
{

$tds = $html->find('td');
$count =0; 
foreach ($tds as $td) {
  $count++;
  if ($count==1){
    $name = html_entity_decode($td->innertext);
   }
  if ($count==2){
        $address = addslashes(html_entity_decode($td->innertext));
   }
  if ($count==3){
    $number = addslashes(preg_replace('/(\d+) - (\d+)/i', '$2$1', $td->innertext));
   }

}
unset($tds, $td);

$name = mysql_real_escape_string($name);
$address = mysql_real_escape_string($address);
$number = mysql_real_escape_string($number);
$inAlready=mysql_query("SELECT * FROM people WHERE phone=$number");
while($e=mysql_fetch_assoc($inAlready))
            $output[]=$e;
    if (json_encode($output) != "null"){ 
        //print(json_encode($output));
    } else {

mysql_query("INSERT INTO people (name, area, phone)
VALUES ('$name', '$address', '$number')");
}
}

And here is a picture of the growth in memory size:
enter image description here

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T16:14:01+00:00Added an answer on May 27, 2026 at 4:14 pm

    I modified the code a little bit to free as much memory as I see could be freed.
    I’ve added a comment above each modification. The added comments start with “#” so you could find them easier.
    This is not related to this question, but worth mentioning that your database insertion code is vulnerable to SQL injection.

    <?php
    require("simple_html_dom.php");
    $thelink = "http://www.somelink.co.uk";
    
    # do not keep raw contents of the file on memory
    #$data1 = file_get_contents($thelink);
    #$html1 = str_get_html($data1);
    $html1 = str_get_html(file_get_contents($thelink));
    
    $ret1 = $html1->find('#idResults2');
    
    // first inception level, we know page has only links
    if (!$ret1){
        $es1 = $html1->find('table.litab a');
    
        # free $html1, not used anymore
        unset($html1);
    
        $countlinks1 = 0;
        foreach ($es1 as $aa1) {
            $links1[$countlinks1] = $aa1->href;
            $countlinks1++;
            // echo (addslashes($aa->href));
        }
    
        # free memroy used by the $es1 value, not used anymore
        unset($es1);
    
        //for every link in array do the same
    
        for ($i = 0; $i <= $countlinks1; $i++) {
            # do not keep raw contents of the file on memory
            #$data2 = file_get_contents($links1[$i]);
            #$html2 = str_get_html($data2);
            $html2 = str_get_html(file_get_contents($links1[$i]));
    
            $ret2 = $html2->find('#idResults2');
    
            // if got information then send to DB
            if ($ret2){
                pullInfo($html2);
            } else {
            // continue inception
    
                $es2 = $html2->find('table.litab a');
    
                # free memory used by $html2, not used anymore.
                # we would unset it at the end of the loop.
                $html2 = null;
    
                $countlinks2 = 0;
                foreach ($es2 as $aa2) {
                    $links2[$countlinks2] = $aa2->href;
                    $countlinks2++;
                }
    
                # free memory used by $es2
                unest($es2);
    
                for ($j = 0; $j <= $countlinks2; $j++) {
                    # do not keep raw contents of the file on memory
                    #$data3 = file_get_contents($links2[$j]);
                    #$html3 = str_get_html($data3);
                    $html3 = str_get_html(file_get_contents($links2[$j]));
                    $ret3 = $html3->find('#idResults2');
                    // if got information then send to DB   
                    if ($ret3){
                        pullInfo($html3);
                    }
    
                    # free memory used by $html3 or on last iteration the memeory would net get free
                    unset($html3);
                }
            }
    
            # free memory used by $html2 or on last iteration the memeory would net get free
            unset($html2);
        }
    }
    
    
    
    function pullInfo($html)
    {
        $tds = $html->find('td');
        $count =0; 
        foreach ($tds as $td) {
          $count++;
          if ($count==1){
            $name = addslashes($td->innertext);
           }
          if ($count==2){
                $address = addslashes($td->innertext);
           }
          if ($count==3){
            $number = addslashes(preg_replace('/(\d+) - (\d+)/i', '$2$1', $td->innertext));
           }
    
        }
    
        # check for available data:
        if ($count) {
            # free $tds and $td
            unset($tds, $td);
    
            mysql_query("INSERT INTO people (name, area, phone)
            VALUES ('$name', '$address', '$number')");
        }
    
    }
    

    Update:

    You could trace your memory usage to see how much memory is being used in each section of your code. this could be done by using the memory_get_usage() calls, and saving the result to some file. like placing this below code in the end of each of your loops, or before creating objects, calling heavy methods:

    file_put_contents('memory.log', 'memory used in line ' . __LINE__ . ' is: ' . memory_get_usage() . PHP_EOL, FILE_APPEND);
    

    So you could trace the memory usage of each part of your code.

    In the end remember all this tracing and optimization might not be enough, since your application might really need more memory than 32 MB. I’v developed a system that analyzes several data sources and detects spammers, and then blocks their SMTP connections and since sometimes the number of connected users are over 30000, after a lot of code optimization, I had to increase the PHP memory limit to 768 MB on the server, Which is not a common thing to do.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need a script that can spider a website and return the list of
The script below will replace selected word in a textarea. But it only works
My script puts together a set of htaccess rules based on information passed into
I need to set up a cronjob that would run some ruby script on
How can I execute an SQL command through a shell script so that I
I've got a script that I'm trying to modify so that I don't load
I am trying to debug my jQuery(Tools) script that is not working on IE.
I have a simple HTML5 page that I'm building to figure some things out
I'm creating a web page that search users in the database. When I'm typing
I have a content slider script that works fine for me in FF, IE

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.