I’m creating a web crawler. I’m ganna give it an URL and it will

Question

0

Asked: June 1, 20262026-06-01T09:32:53+00:00 2026-06-01T09:32:53+00:00

I’m creating a web crawler. I’m ganna give it an URL and it will

0

I’m creating a web crawler. I’m ganna give it an URL and it will scan through the directory and sub directories for .html files. I’ve been looking at two alternatives:

scandir($url). This works on local files but not on http sites. Is this because of file permissions? I’m guessing it shouldn’t work since it would be dangerous for everyone to have access to your website files.
Searching for links and following them. I can do file_get_contents on the index file, find links and then follow them to their .html files.

Do any of these 2 work or is there a third alternative?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T09:32:54+00:00

The only way to look for html files is to parse throuhg the file content returned by the server, unless by small chance they have enabled directory browsing on the server, which is one of the first things disabled usually, you dont have access to browse directory listings, only the content they are prepared to show you, and let you use.

You would have to start a http://www.mysite.com and work onwards scanning for links to html files, what if they have asp/php or other files which then return html content?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m creating a web crawler. I’m ganna give it an URL and it will

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply