Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 938435
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T21:38:37+00:00 2026-05-15T21:38:37+00:00

I want to create a PHP function that goes through a website’s homepage, finds

  • 0

I want to create a PHP function that goes through a website’s homepage, finds all the links in the homepage, goes through the links that it finds and keeps going until all the links on said website are final. I really need to build something like this so I can spider my network of sites and supply a “one stop” for searching.

Here’s what I got so far –

function spider($urltospider, $current_array = array(), $ignore_array = array('')) {
    if(empty($current_array)) {
        // Make the request to the original URL
        $session = curl_init($urltospider);
        curl_setopt($session, CURLOPT_RETURNTRANSFER, true);
        $html = curl_exec($session);
        curl_close($session);
        if($html != '') {
            $dom = new DOMDocument();
            @$dom->loadHTML($html);
            $xpath = new DOMXPath($dom);
            $hrefs = $xpath->evaluate("/html/body//a");
            for($i = 0; $i < $hrefs->length; $i++) {
                $href = $hrefs->item($i);
                $url = $href->getAttribute('href');
                if(!in_array($url, $ignore_array) && !in_array($url, $current_array)) {
                    // Add this URL to the current spider array
                    $current_array[] = $url;
                }
            }               
        } else {
            die('Failed connection to the URL');
        }
    } else {
        // There are already URLs in the current array
        foreach($current_array as $url) {
            // Connect to this URL

            // Find all the links in this URL

            // Go through each URL and get more links
        }
    }
}

The only problem is, I can’t seem to get my head around how to proceed. Can anyone help me out? Basically, this function will repeat itself until everything has been found.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T21:38:38+00:00Added an answer on May 15, 2026 at 9:38 pm

    I’m not PHP expert, but you seem to be over-complicating it.

    function spider($urltospider, $current_array = array(), $ignore_array = array('')) {
        if(empty($current_array)) {
            $current_array[] =  $urltospider;
        $cur_crawl = 0;
        while ($cur_crawl < len($current_array)) { //don't use foreach because that can get messed up if you change the array while inside the loop.
            $links_found = crawl($current_array($cur_crawl)); //crawl should return all links found in the given page
            //Now keep adding $links_found to $current_array. Maybe you can check if any of the links found are already in $current_array so you don't crawl them multiple times
            $current_array = array_merge($current_array, $links_found);
            $cur_crawl += 1;
        }
    return $current_array;
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm working in PHP and I want to create a function that, given a
I want to create a PHP script that grabs the content of a website.
I want to create a PHP app that can send text messages ; There
I want to create a php page on the website (CMS Wordpress), but I
I want to create a site in PHP with cakephp that users can insert
I want to create some table through PHP but it fails everytime, but the
Possible Duplicate: PHP and Enums I want to create a function with several parameters
I want to create a function which receives a single argument that holds the
Using this function I want to create a PHP navigation: function loadPage($pagename) { if
I'm writing a PHP function that will delete all the files in the directory

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.