I am writing a proxy server that maps youtube.com to another domain (so users can easily access youtube from countries like Germany without search results and videos being censored).
Unfortunately there was a bug in my robots.txt. Its fixed now, but Baiduspider got my old robots.txt and has been trying to index the whole website for a couple of days.
Because Youtube is a quite big website, I don’t think this process will end soon 🙂
I already tried redirecting baiduspider to another page and sending it a 404, but it already parsed to many paths.
What can I do about this?
Stop processing requests from Baiduspider
with lighttpd append to lighttpd.conf
sooner or later Baiduspider should refetch the robots.txt
(see http://blog.bauani.org/2008/10/baiduspider-spider-english-faq.html)