I need to parse a large XML file (>1 GB) which is located on a FTP server. I have a FTP stream aquired by ftp_connect(). (I use this stream for other FTP-related actions)
I know XMLReader is preferred for large XML files, but it will only accept a URI. So I assume a stream wrapper will be required. And the only ftp-function I know of which will allow me to retrieve only a small part of the file is ftp_nb_fget() in combination with ftp_nb_continue().
However, I do not know how I should put all of this together to make sure that a minimum amount of memory is used.
It looks like you may need to build on top of the low-level XML parser bits.
In particular, you can use
xml_parseto process XML one chunk of the XML string at a time, after calling the variousxml_set_*functions with callbacks to handle elements, character data, namespaces, entities, and so on. Those callbacks will be triggered whenever the parser detects that it has enough data to do so, which should mean that you can process the file as you read it in arbitrarily-sized chunks from the FTP site.Proof of concept using CLI and
xml_set_default_handler, which will get called for everything that doesn’t have a specific handler: