If I have a web service / application in Tomcat, do I have to worry about a mechanism like “robots.txt” to keep search engines from trolling tomcat? I just want to prevent google and others from indexing anything in tomcat… it’s currently a small project that I’m working on, the server is accessible via a static ip and domain name, and I currently have no authentication in place for tomcat stuff though that will be done shortly. If you were to simply type in the domain name or ip address of the server, you will get a simple “blank” page in IIS… it’s only when you know, and type in the tomcat sub-directory name (which is wired into IIS using jakarta) that the tomcat app becomes visible and shows up in the browser.
Do I have to worry about any of that? Would think that since google can’t reach the initial tomcat url unless it knows it before hand, it has nothing to “feed off of”?
Unless there’s anything in the web linking to your Tomcat pages, I wouldn’t worry about that.
Think about it from Google’s perspective – how would you implement such a thing? Start at any given URL and brute force all possible paths? I would be very much surprised if Google was that powerful.
This type of content even has a name, Invisible Web.