I’m writing a simple crawler, and ideally to save bandwidth, I’d only like to

Question

0

Asked: May 16, 20262026-05-16T02:37:08+00:00 2026-05-16T02:37:08+00:00

I’m writing a simple crawler, and ideally to save bandwidth, I’d only like to

0

I’m writing a simple crawler, and ideally to save bandwidth, I’d only like to download the text and links on the page. Can I do that using HTTP Headers? I’m confused about how they work.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T02:37:08+00:00

You’re on the right track to solving the problem.

I’m not sure how much you already know about HTTP headers, but basically an HTTP header is just a string formatting for a web server – it follows a protocol – and is pretty straightforward in that aspect. You write a request, and receive a response. The requests look like the things you see in the Firefox plugin LiveHTTPHeaders at https://addons.mozilla.org/en-US/firefox/addon/3829/.

I wrote a small post at my site http://blog.gnucom.cc/2010/write-http-request-to-web-server-with-php/ that shows you how you can write a request to a web server and then later read the response. If you only accept text/html you’ll only accept a subset of what is available on the web (so yes, it will “optimize” your script to an extent). Note this example is really low level, and if you’re going to write a spider you may want to use an existing library like cURL or whatever other tools your implementation language offers.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m writing a simple crawler, and ideally to save bandwidth, I’d only like to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply