I want to throttle requests to my web server so as to thwart web scraping and denial of service attacks against my site. I’m willing to be relatively lax, the key thing is that no one requests so much so as to slow things down.
I was thinking of setting up throttling by IP address, so that requests from a given IP would be slowed if too many requests were made in a short period of time.
Some questions I have–
- Is this the right way to go about dealing with web scrapers and DoS attacks at the web server level?
- What’s a good limit so that I don’t inconvenience regular users who may be working on shared IP networks?
- How specifically should I setup the throttling? I’m using Apache/2.2
“Is this the right way … at the web server level?” It’s probably the best option you have. It might be good to have different thresholds on different parts of your site: you may be more willing to throttle certain kinds of traffic than others. But ideally these kinds of settings would be managed at the network level.
“What’s a good limit … ?” It completely depends on your traffic. How much you expect, where your real users come from, etc.
How to do it? It is possible to write rules to handle this sort of thing in ModSecurity, which also defends against some other stuff. As with the mod_evasive answer, this won’t fully protect you against attackers with a lot of resources at their disposal, but it would force them to step up their game.
I don’t think there’s anything “built into” Apache httpd that will facilitate this. The expectation would be that issues with an abusive IP address (i.e., network traffic issues) are managed at the network level.
EDIT:
Since you comment elsewhere that you are using Rackspace for hosting, you might want to check out their load balancer API.