I’m trying to parse the DMOZ content/structures XML files into MySQL, but all existing scripts to do this are very old and don’t work well. How can I go about opening a large (+1GB) XML file in PHP for parsing?
I’m trying to parse the DMOZ content/structures XML files into MySQL, but all existing
Share
There are only two php APIs that are really suited for processing large files. The first is the old expat api, and the second is the newer XMLreader functions. These apis read continuous streams rather than loading the entire tree into memory (which is what simplexml and DOM does).
For an example, you might want to look at this partial parser of the DMOZ-catalog: