Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6110765
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T14:33:11+00:00 2026-05-23T14:33:11+00:00

I have cobbled together a class that checks links. It works but it is

  • 0

I have cobbled together a class that checks links. It works but it is slow:

The class basically parses a HTML string and returns all invalid links for href and src attributes. Here is how I use it:

$class = new Validurl(array('html' => file_get_contents('http://google.com')));

$invalid_links = $class->check_links();

print_r($invalid_links);

With HTML that has a lot of links it becomes really slow and I know it has to go through each link and follow it, but maybe someone with more experience can give me a few pointers on how to speed it up.

Here’s the code:

class Validurl{

    private $html = '';

    public function __construct($params){ 

        $this->html = $params['html'];

    } 

    public function check_links(){

        $invalid_links = array();    

        $all_links = $this->get_links();

        foreach($all_links as $link){

            if(!$this->is_valid_url($link['url'])){

                array_push($invalid_links, $link);

            }

        }

        return  $invalid_links;

    }

    private function get_links() {

        $xml = new DOMDocument();

        @$xml->loadHTML($this->html);

        $links = array();

        foreach($xml->getElementsByTagName('a') as $link) {
            $links[] = array('type' => 'url', 'url' => $link->getAttribute('href'), 'text' => $link->nodeValue);
        }

        foreach($xml->getElementsByTagName('img') as $link) {
            $links[] = array('type' => 'img', 'url' => $link->getAttribute('src'));
        }        

        return $links;
    }

    private function is_valid_url($url){

         if ((strpos($url, "http")) === false) $url = "http://" . $url;

         if (is_array(@get_headers($url))){

              return true;

         }else{

             return false;

         }
    }

}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T14:33:11+00:00Added an answer on May 23, 2026 at 2:33 pm

    First of all I would not push the links and images into an array, and then iterate through the array, when you could directly iterate the results of getElementsByTagName(). You’d have to do it twice for <a> and <img> tags, but if you separate the checking logic into a function, you just call that for each round.

    Second, get_headers() is slow, based on comments from the PHP manual page. You should rather use cUrl in some way like this (found in a comment on the same page):

    function get_headers_curl($url) 
    { 
        $ch = curl_init(); 
    
        curl_setopt($ch, CURLOPT_URL,            $url); 
        curl_setopt($ch, CURLOPT_HEADER,         true); 
        curl_setopt($ch, CURLOPT_NOBODY,         true); 
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
        curl_setopt($ch, CURLOPT_TIMEOUT,        15); 
    
        $r = curl_exec($ch); 
        $r = split("\n", $r); 
        return $r; 
    }
    

    UPDATE: and yes, some kind of caching could also help, e.g. an SQLITE database with one table for the link and the result, and you could purge that db like each day.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a python script that I cobbled together that checks my gmail via
OK, so I have a cursor adapter that I cobbled together from various source
I have cobbled the below together in my (very) humble jQuery hackish way: $(.toggle_container).hide();
I have a div I've made resizable using jQuery UI. It works fine, but
This code works, but I just hacked it together with my limited knowledge of
I have cobbled together some scripts from various internet sources to get a form
I've cobbled together a simple game loop, mostly using the techniques that seem be
Basically, I've got a custom class that has a draw method that draws a
This much I have already cobbled together (thanks to Stack Overflowers): I have a
I have cobbled together a form due to some oddities in my code and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.