I have a website that I built using Django. Using the settings.py file, I send myself error messages that are generated from the site, partly so that I can see if I made any errors.
From time to time I get rather strange errors, and they seem to mostly be around about the same area of the site (where I wrote a little tutorial trying to explain how I set up a Django Blog Engine).
The errors I’m getting all appear like something I could have done in a typo.
For example, these two errors are very close together. I never had an ‘x’ or ‘post’ as a variable on those pages.
‘/blog_engine/page/step-10-sub-templates/{{+x.get_absolute_url+}}/’
‘/blog_engine/page/step-10-sub-templates/{{+post.get_absolute_url+}}/’
The user agent is:
‘HTTP_USER_AGENT’: ‘Mozilla/5.0 (compatible; Purebot/1.1; +http://www.puritysearch.net/)’,
Which I take it is a scraper bot, but I can’t figure out what they would be able to get with this kind of attack.
At the risk of sounding stupid, what should I do? Is it a hack attempt or are they simply trying to copy my site?
Edit: I’ll follow the advice already given, but I’m really curios as to why someone would run a script like this. Are they just trying to copy. It isn’t hitting admin pages or even any of the forms. It would seem like harmless (aside from potential plagiarism) attempts to dig in and find content?
From your
USER_AGENTinfo it looks like this is a web spider from puritysearch.net.robots.txtfile which most crawlers honor. Mention your rules inrobots.txt. You can say the crawlers to keep off certain busy sections of your site etc.This way you will not be completely blocking crawlers (which are needed for your website to become popular) and at the same time you are making sure that your users get fast experience on your site.