We’re using django to make a json webservice front-end for mysql. We have apache and django running on an EC2 instance and MySQL running on an RDS instance. We’ve started benchmarking performance using apache bench and got some really poor performance numbers. We also noticed that while running the tests, our apache/django instance goes to 100% cpu usage at very low load and the MySQL instance never gets above 2% cpu usage.
We’re trying to make sense of this and isolate the problem, so we did several ab tests:
- A request for a static html page from apache — ~2000 requests/second.
- A request that executes a small python function in django, and no db interaction — ~1000 requests/second.
- A request that executes one of our django webservice functions that calls authenticate and then does a very simple query to fetch one record from a table — 11 requests/second
- Same as 3, but commented the call to authenticate — 95 requests/second.
Why is authenticate so slow? Is it writing data to the db, finding a billion digits of pi, what?
We would like to keep the call to authenticate in these functions, because we don’t want to leave them open to anyone that can guess the url, etc. Has anyone here noticed that authenticate is slow, and can anyone suggest a way to remedy it?
Thank you very much!
I am no expert in authentication and security but the following are some ideas as to why this might be happening and possibly how you can increase the performance somewhat.
Since passwords are stored in the db, to make their storage secure, plaintext password are not stored but their hash is stored instead. This way you can still validate user logging in by comparing the computed hash from the typed password to the one stored in the db. This increases security so that if a malicious party will get a copy of the db, the only way to decode the plaintext passwords is by either using rainbow-tables or doing a brute-force attack.
This is where things get interesting. According to Moore’s Law, computers are becoming exponentially faster, hence computing hash functions becomes much cheaper in terms of time, especially quick hash functions like md5 or sha1. This poses a problem because having all of the computing power available today combined with fast hash functions, hackers can brute-force hashed passwords relatively easy. To combat this, two things can be done. One it to loop the hash function multiple times (output of the hash is fed back into the hash). This however is not very effective because it only increases the complexity of the hash function by a constant. That’s why the second approach is preferred which is to make the actual hash function be more complex and computationally expensive. Having more complex function, it takes more time for the hash to be computed. Even if it takes a second to compute, it is not a big deal for end-users, but it is a big deal for brute-force attack because millions of hashes have to be computed. That’s why starting with Django 1.4, it uses a pretty computationally expensive function called PBKDF2.
To get back to your answer. It’s because of this function, when you enable authentication, your benchmark number drastically goes down and your CPU goes up.
Here are some ways you can increase the performance.