The latest days I have coded a web-crawler. The only question I have left is, does “standard” web-crawlers crawl links queries like this one:
https://www.google.se/?q=stackoverflow
or does it skip the queries and pick them up like this:
https://www.google.se
The latest days I have coded a web-crawler. The only question I have left
Share
In case you are referring to crawling for some sort of indexing of web resources:
The answer is very long but in short my opinion is that:
if you have this “page/resource”: https://www.google.se/?q=stackoverflow pointed to by many other pages (i.e. it has a large in-link degree) then not integrating it to your index might mean that you miss a very important node in the webgraph. On the other hand, imagine how many links of this type google.com/q=”query” are there on the web. Probably a huge number so this would certainly be a huge overhead for your crawler/indexer system.