I wrote a crawler for spesific dynamic website. All crawl jobs taking over 3

Question

0

Asked: June 17, 20262026-06-17T00:06:05+00:00 2026-06-17T00:06:05+00:00

I wrote a crawler for spesific dynamic website. All crawl jobs taking over 3

0

I wrote a crawler for spesific dynamic website. All crawl jobs taking over 3 hours.
I want to control the page is already crawled or there are some changes on page.
If i can do this the script will be completed in very short time.

for example:

    foreach ($urls as $url) {
        if(thereAreChanges($url)){
            crawl($url);
        }
    }

Information: The web page doesn’t provide content-length and crc.

Array ( [0] => HTTP/1.1 200 OK 
        [Date] => Tue, 08 Jan 2013 07:47:03 GMT 
        [Server] => Apache 
        [Set-Cookie] => Array ( 
                [0] => PHPSESSID=eisb6qjme9b0ouoga9su9fgok4; path=/  
                [1] => j12011=a%3A3%3A%7Bs%3A3%3A%22sid%22%3Bs%3A26%3A%22eisb6qjme9b0ouoga9su9fgok4%22%3Bs%3A2%3A%22ip%22%3Bs%3A12%3A%2294.103.47.65%22%3Bs%3A4%3A%22time%22%3Bi%3A1357631223%3B%7D; expires=Sat, 09-Mar-2013 07:47:03 GMT; path=/  
        ) 
        [Expires] => Thu, 19 Nov 1981 08:52:00 GMT 
        [Cache-Control] => no-store, no-cache, must-revalidate, post-check=0, pre-check=0 
        [Pragma] => no-cache 
        [Vary] => Accept-Encoding 
        [Connection] => close 
        [Content-Type] => text/html 
)

The site provides Content-Type but doesnt provide Content-Length. How can i ask content-length to apache.

Update : http://urivalet.com/ can get content-length. I need this.

If i can get CRC code of page in header. It will be perfect. But I guess this is long shot.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T00:06:07+00:00

Editorial Team

2026-06-17T00:06:07+00:00Added an answer on June 17, 2026 at 12:06 am

Solution is 'header'=>"Accept-Encoding: gzip"

That’s why header doesn’t return Content-Length, with this parameter page returns content-length.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I wrote a crawler for spesific dynamic website. All crawl jobs taking over 3

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply