I have a project that is done but needs better performance.
The gist of the project is that I’m taking XML and converting it to CSV files. The files represent data to be loaded into a Database.
Right now I’m using PHP to unzip the zip file that contains the XML. Then I parse, convert to CSV, and rezip.
It’s been fine till now but the XML files are getting HUGE now. So much that processing takes a little more than a day. I’m also doing some manipulations in there somewhere to the files, like rearranging columns and trims.
What alternatives do you suggest that would help me improve performance?
I’ve thought about writing this parser in C++ but I’m not sure of what route to take. Similar questions have been asked but this is more of a performance issue I suppose. Should I switch languages for performance, stick with PHP and optimize that, should I try to make this parser parallel so more than one file can be done at a time?
What would you suggest?
You should give Perl a try if PHP doesn’t deliver what you wont, but I doubt, maybe you are doing something wrong there (logically).
What kind of XML parser are you using? (Its better be a SAX one…).
Also, it would be nice to see some code (how you parse the XMLs…)