i’m building a web crawler and i’m trying to figure out where is a web page from. I mean, i can check the domain (for example, .com.ar ar from Argentina) but there are other sites, that have other domains (.com, .net) that are argentinean too, an example of these is http://www.taringa.net. Is an Argentinean site but with a .net domain.
So how can i do it?
Thanks.
Geo-location by IP. Do a reverse
look-up on the IP address, and you can
get a geographical location. These
services cost money, and will only
tell you physically where the server
is hosted.
Do a whois on the domain. It will
tell you the where the website is
registered.
But remember,
There is no meaning to “where is a web page from”. The web has no geographic boundaries. I can run a Spanish language site out of San Jose California, and register the domain contacts in Canada. You will have no way of knowing my site is aimed at Chilean users.