This question has developed off an answer here.
My question therefore is what steps can one take to wend off standard scrapers?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
In addition to all the previous mentions of robots.txt, the robots meta tag, and using more javascript, one of the most sure methods that I know of is to put restricted content behind a user login. This will limit all but purpose-built bots. Add a strong captcha (like reCAPTCHA) to the user login and purpose-built bots will be blocked too.
If a site is looking to verify the identity of a client (ie: including whether it’s a bot), that’s what user-logins are for. 🙂
User login’s can also be disabled if strange activity is detected.