I would like to check to a remote website if it contains some files. Eg. robots.txt, or favicon.ico. Of course the files should be accessible (read mode).
So if the website is: http://www.example.com/ I would like to check if http://www.example.com/robots.txt.
I tried fetching the URL like http://www.example.com/robots.txt. And sometimes you can see if the file is there because you get page not found error in the header.
But some websites handle this error and all you get is some HTML code saying that page can not be found.
You get headers with status code 200.
So Anybody any idea how to check if file exists really or not?
Thanx,
Granit
If they serve an error page with HTTP 200 I doubt you have a reliable way of detecting this. Needless to say that it’s extremely stupid to serve error pages that way …
You could try:
text/htmlyou can assume that it’s a custom error page instead of arobots.txt(which should be served astext/plain). For favicons likewise. But I think simply checking fortext/htmlwould be the most reliable way here.