I’m planning a site which needs the users’ trust. To help trust building, I was thinking to publish the source code, the database schema and even allow users to download raw data (so they can run their own queries on the data to verify that the site gives the correct answers).
Under which circumstances is that a security risk? My thinking is that most of the data can be grazed off the site with a web spider anyway.
For sensitive information like IP addresses and passwords, I plan to store that hashed (with a salt). Age information is not relevant, maybe I’ll just store “adult yes/no”. Anything that I’m missing?
Injection is not an issue if you do your data access properly. You are not worried about keeping your model proprietary so no issue there. You have the green light to expose the model.
Privacy is not an issue if you inform user’s the data is public. People expose private info and photos on Facebook so why not your system? Green light.
Hashed data with a random salt in theory is safe to download. SHA512 with a random hash will probably never be broken. But who knows? At one time MD5 was the “right way” to hash, and now you find a collision in under 1 hour.
The only issue is private data that needs to be encrypted, not hashed. You can’t hash an IP because you will want to use the actual IP at some point in the future. You can’t hash credit card numbers because you will need the real credit card number at some point. You will be forced to utilize a private key and deal with the weakness of keeping it private. By exposing data you totally remove a layer of physical security. Encryption + physical is better than encryption alone.