I have a little problem with google bot, I have a server working on windows server 2009, the system called Workcube and it works on coldfusion, there is an error reporter built-in, thus i recieve every message of error, especially it concerned with google bot, that trying to go to a false link, which doesn’t exist! the links looks like this:
- http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=282&HIERARCHY=215.005&brand_id=hoyrrolmwdgldah
- http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=145&HIERARCHY=200.003&brand_id=hoyrrolmwdgldah
- http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=123&HIERARCHY=110.006&brand_id=xxblpflyevlitojg
- http://www.bilgiteknolojileri.net/index.cfm?fuseaction=objects2.view_product_list&product_catid=1&HIERARCHY=100&brand_id=xxblpflyevlitojg
of course with definition like brand_id=hoyrrolmwdgldah or brand_id=xxblpflyevlitojg is false, i don’t have any idea what can be the problem?! need advice! thank you all for help! 😉
You might want to verify your site with Google Webmaster Tools which will provide URLs that it finds that error out.
Your logs are also valid, but you need to verify that it really is Googlebot hitting your site and not someone spoofing their User Agent.
Here are instructions to do just that: http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html
Essentially you need to do a reverse DNS lookup and then a forward DNS lookup after you receive the host from the reverse lookup.
Once you’ve verified it’s the real Googlebot you can start troubleshooting. You see Googlebot won’t request URLs that it hasn’t naturally seen before, meaning Googlebot shouldn’t be making direct object reference requests. I suspect it’s a rogue bot with a User Agent of Googlebot, but if it’s not you might want to look through your site to see if you’re accidentally linking to those pages.
Unfortunately you posted the full URLs, so even if you clean up your site, Googelbot will see the links from Stack Overflow and continue to crawl them because it’ll be in their crawl queue.
I’d suggest 301 redirecting these URLs to someplace that make sense to your users. Otherwise I would 404 or 410 these pages so Google know to remove these pages from their index.
In addition, if these are pages you don’t want indexed, I would suggest adding the path to your robots.txt file so Googlebot can’t continue to request more of these pages.
Unfortunately there’s no real good way of telling Googlebot to never ever crawl these URLs again. You can always go into Google Webmaster Tools and request the URLs to be removed from their index which may stop Googlebot from crawling them again, but that doesn’t guarantee it.