We have a web application that saves survey data in XML format, rather than to a standard database. It uses a third-party survey component (actually, this component, but modified to add some functionality)
Right now we’re using a very restrictive white-list validation approach to prevent potentially malicious input from making it into the survey results XML file. However, the white-list rules are turning into a support problem for our customers. They’re simply too restrictive, and I’m wondering if I’m simply being too restrictive. I don’t know what the effects are on an XML document if someone were to use angled brackets (html/xml tags unclosed or closed), etc.
If I’m storing these in CDATA sections, and then sanitizing the output will that prevent corruption to the XML document in these cases? We do use the Microsoft.Anti-Xss and Web Protection libraries to sanitize all untrusted output.
Better yet, is there a guide to preventing XML injection, or XSS specific to XML data? or am I just over-thinking this?
The core question is, do I need to filter the input as long as I’m sanitizing the output properly? My paranoid nature says yes, but I’m not sure and thought I’d ask for some expert opinions.
Check the OWASP Data Validation Guide to understand what the vulnerabilities are that you need to address.