We want to disallow from user-agents JavaScript files and CSS files and pictures, correct? Classes, modules and other folders of such a type should be htaccess protected. Am I right? If no, please let me know about that.
As result, a typical robots.txt (and we don’t forget to password protect the other folders) could contain only several strings:
User-agent: *
Disallow:
Disallow: /cssfiles/
Disallow: /jsfiles/
Disallow: /pics/
Does it make sense to disallow both mysite.com?index.php&page=registration and mysite.com?index.php&page=login? If yes (what for?), then how?
Also, did I forget something?
Folders that have a basic HTTP authentication requirement applied by an .htaccess file don’t have to be in your robots.txt file because spiders will not be able to access them.
I typically do not exclude css/javascript when building sites. I don’t think the major search engines are interested in listing those files in their search results because they are not useful to most people. However, if you want to be on the safe side then there is no harm in adding them.
As for images, if you don’t want them appearing in places like Google Images then you can add your image folder to robots.txt.
I would not attempt to disallow your registration or login pages. They are legitimate areas of your site and should be indexed.
A very important thing to remember about robots.txt files are that they do not have the ability to enforce their directives. They can only make recommendations to the spider not to crawl certain things. While most major search engines will respect this, some homemade and/or malicious spiders will not. If there’s something you want to protect from spiders make sure it is either protected by some authentication mechanism or not web-accessible.