On our message board, we use password matching to help detect members with multiple registrations and enforce our rules against malicious puppet accounts. It worked well when we had SHA256 hashes and a per-site salt. But we recently had a humbling security breach in which a number of password hashes fell to a dictionary attack. So we forced a password change, and switched to bcrypt + per-user salts.
Of course, now password matching doesn’t work anymore. I don’t have a formal education in cryptography or computer science so I wanted to ask if there’s a secure way to overcome this problem. Somebody I work with suggested a second password field using a loose hashing algorithm which intentionally has lots of collisions, but it seems to me that this would either lead to tons of false positives, or else reduce the search space too much to be secure. My idea was to stick with bcrypt, but store a second password hash which uses a per-site salt and an extremely high iteration count (say 10+ seconds to generate on modern hardware). That way users with the same password would have the same hash, but it couldn’t be easily deduced with a dictionary attack.
I’m just wondering if there’s an obvious problem with this, or if someone more knowledgeable than me has any suggestions for a better way to approach things? It seems to me like it would work, but I’ve learned that there can be a lot of hidden gotchas when it comes to security. 😛 Thanks!
Short Answer
Any algorithm that would allow you to detect whether or not 2 users had the same password would also allow an attacker to detect whether or not 2 users had the same password. This is, effectively, a precomputation attack. Therefore, your problem is not securely solvable.
Example
If I can apply your password transformation algorithm to "password" and quickly tell which users use "password" as their password, then the system is vulnerable to a form of precomputation attack.
If I must do an expensive calculation to determine the password for each individual user and work spent to calculate User A’s password does not make calculating User B’s password easier, then the system is secure (against these type of attacks).
Further Consideration
Your idea of using a per-site salt with bcrypt and a high iteration count may seem attractive at first, but it just can’t scale. Even at 10 seconds, that’s 6 password guesses per minute, 360 per hour, 8640 per day, or 3M per year (that’s a lot). And that’s just one machine. Throw a botnet of machines at that problem, or some GPU’s and suddenly that number goes through the roof. Just 300 machines/cores/GPU’s could knock out 2.5M guesses in a day.
Because you would be using the same salt for each one, you’re allowing the attacker to crack all of your user’s passwords at once. By sticking with a per-user salt only, the attacker can effectively only attempt to crack a single user’s password at a time.