I donot know much about files and its related security. I have a LOT of data in XML files which i am planning on parsing to put in the database. I get these XML files from 3rd party people. I will be getting minimum around 1000 files per day. So i will write a script to parse them to enter in our database. Now i have many questions regarding this.
- I know how to parse a single file. And i can extend the logic to multiple files in a single loop. But.Is there a better way to do the same? How can i use multi threaded programming to parse the files simultaneously many of them. There will be a script which, given the file, parses the single file and outputs to database. How can i use this script to parse in multiple threads/parallel processing
- The File as i said, Comes from a 3rd party site. So how can i be sure that there are no security loop holes. I mean, i dono much about file security. But what are the MINIMUM common basic security checks i need to take.(like sql injection and XSS in web programing are VERY basic)
- Again security related: How to ensure that the incoming XML file is XML itself. I mean i can use the extension, But is there a possibility to inject scripts and make them run when i parse these files. And What steps should i take while parsing individual files
You want to validate the XML. This does two things:
In php5 the syntax for validating XML documents is:
$dom->validate('articles.dtd');$dom->relaxNGValidate('articles.rng');$dom->schemaValidate('articles.xsd');Of course you need an XSD (XML Schema) or DTD (Document Type Definition) to validate against.