Hello I am trying to make a little spider. While I was building it

Question

0

Asked: May 27, 20262026-05-27T13:07:07+00:00 2026-05-27T13:07:07+00:00

Hello I am trying to make a little spider. While I was building it

0

Hello I am trying to make a little spider.
While I was building it I came across a problem where I need to check if a link is a root domain link or a subdomain link.

For example:

http://www.domain.com or
http://domain.com 
http://domain.com/index.php
http://domain.com/default.php
http://domain.com/index.html
http://domain.com/default.html

.
.
etc
are all the same.

So I need a function actually that takes the string url as an input and checks if it’s the root or homepage whatever you like to call it of a site.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T13:07:08+00:00

As noted in comments, this is really a basic aspect of coding the spider. If you intend to code a general purpose spider, you’ll need to add means to resolve URLs and detect if they point to the same content and in what way (through a redirect or simply through duplicate content), as well as what kind of content they point to.

You need at least to handle:

relative paths
GET-variables that are in one way or another significant to the web page, but does not render differences in the content.
Malformed URLs.
JavaScript related information in the href attribute.
Links to non-HTML material — direct download links to PDFs, images etc. (detect it on extension isn’t always enough, what with PHP scripts delivering images).

These are just some of the aspects but it all comes down to the point that the kind of detection your after have to be a fundamental part of the spider if you intend to use it in any kind of generic manner.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Hello I am trying to make a little spider. While I was building it

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply