If I receive user input, does it make any difference if I validate first and then sanitize before saving it to the database?
Or is there any risk in validating unsanitized input?
(when I say ‘sanitize’ I mainly mean stripping out any HTML tags)
UPDATE & CLARIFICATION:
I wouldn’t put HTML tags into the database. I would sanitize the input before saving it – but after validating against my model. The reason the validation and sanitization are separate is because they are separate libraries – the only question is whether I should call ‘sanitize’ upon ‘before_validate’ or upon ‘before_save’.
Why are you separating HTML stripping (or escaping) from other validation? Isn’t it all the same thing?
And why would you put HTML tags into the database only to strip them later? Doesn’t that mean that your database is temporarily incorrect?
I don’t see why you’re separating “validation” from “sanitize”. They’re two sides of the same coin. Do everything you can to make sure the data is perfect before committing it to the database.
“the only question is whether I should call ‘sanitize’ upon ‘before_validate’ or upon ‘before_save’.”
The distinction is too subtle for words. You must do both. Generally, you do not want to try and validate HTML.
Therefore, it’s only sensible if you (1) “sanitize” to strip HTML tags and then (2) validate what’s left.
I’m not sure how else you could do it.