Good day,
I’ve done some research looking for this answer, but haven’t had much luck. Hoping someone can help..
The situation is that a site I’m working on (built in ASP.net) which forces SSL on most of their pages has some folders (ie. site.com/dontindex )containing files that definitely shouldn’t be indexed by search engines. Google has links to these files in its index (ie. https://www.site.com/dontindex/file.pdf).
My issue is that I have created a robots.txt file to disallow those folders from indexing, but from what I’ve read, that isn’t going to prevent those files being indexed – as some of them might be referenced through secure pages. I’m thinking that only the non-secure pages are disallowed in this way. Q1) Is that even correct?
When I tested http://www.site.com/dontindex/file.pdf against the new robots file in Google Webmaster Tools, wit came back as “Blocked by line 5: Disallow: /dontindex/”, but when I tried https://www.site.com/dontindex/file.pdf it came back as “Not in domain”.
From what I can gather, I should have a second robots.txt file somewhere for the secure files/folders. I’ve read that if the site were running php, I could do some sort of a rewrite rule to cover this, but what to do in my ASP.net situation?? Q2) If it applies to me to have a second robots file (given that it’s an ASP.net site), where should I put this file?
Thank-you for any help!
I think the problem is more to do with the Google webmaster tools, as
http://site.com/robots.txt == https://site.com/robots.txtThe not in domain error is I think because Google classes the 2 as separate sites. You need to add the https and http sites to check the robots file.