How can it scan all available pages automatically?
One way I can think of is to scan it recursively from the home page.
But it won’t be able to scan out the back end CMS .
So how do those scanning tools work?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Stupid web crawler:
Start by creating an array to store links, and putting one URL in there yourself. Create a second empty array to store visited URLs. Now start a program which does the following.
If you assume that every page on the web is reachable by following some number of random links (possibly billions), then simply repeating steps 1 through 4 will eventually result in downloading the entire web. Since the web is not actually a fully connected graph, you have to start the process from different points to eventually reach every page.