I need to just do a quick match and replace all that comes from a xml. I don’t want to phrase the file since the file is like 100mb and I can’t stop that from being the case. So here is the sample data.
<?xml version="1.0" encoding="UTF-8"?>
<products>
<product active="1" on_sale="0" discountable="0">
<sku>SKUTARGET</sku>
<name><![CDATA[sdfsdf (NET)]]></name>
<description><![CDATA[agag adgsgsdg asdgsdg]]></description>
<keywords></keywords>
<price>9.000000</price>
<stock_quantity>35</stock_quantity>
<reorder_quantity>0</reorder_quantity>
<height>0.000000</height>
<length>0.000000</length>
<diameter>0.000000</diameter>
<weight>0.000000</weight>
<color>Black</color>
<material>PVC</material>
<barcode>883045010070</barcode>
<release_date>2008-11-10</release_date>
<images>
<image>/sdssd/sdfsd.jpg</image>
<image>/AL10sdfsds07XO/sdfsd.jpg</image>
</images>
<categories>
<category code="166" video="0" parent="172">sd & Sexy sdf</category>
<category code="172" video="0" parent="">sd & dddsdsds</category>
<category code="641" video="0" parent="172">sdfsdf Costume sdfsdfsdf</category>
</categories>
<manufacturer code="AL" video="0">sdfsdf sdfs</manufacturer>
<type code="LI" video="0">sdfsd</type>
</product>
<product active="1" on_sale="0" discountable="0">
<sku>XXXXXXX</sku>
<name><![CDATA[LEATHER sdfsdf (NET)]]></name>
<description><![CDATA[asdgsdgsd sad sadg asdg asdg asdg asdg asdg asdg asdg asdg asdg asdg asdg]]></description>
<keywords></keywords>
<price>5.000000</price>
<stock_quantity>36</stock_quantity>
<reorder_quantity>0</reorder_quantity>
<height>0.000000</height>
<length>0.000000</length>
<diameter>0.000000</diameter>
<weight>0.000000</weight>
<color>Black</color>
<material>Leather</material>
<barcode>883045300164</barcode>
<release_date>2008-11-10</release_date>
<images>
<image>/AL10sds0XO/sdsdsd.jpg</image>
<image>/sdsds/AL1sd00XOB.jpg</image>
<image>/AL1sdsds00XO/sdsds.jpg</image>
</images>
<categories>
<category code="80" video="0" parent="44">sdgsdgsdg</category>
<category code="181" video="0" parent="172">Sleep & Lounge</category>
</categories>
<manufacturer code="AL" video="0">Allure sdsds</manufacturer>
<type code="LI" video="0">sdsfsdfsd</type>
</product>
</products>
What I need is just the one block starting at node products where the sku is a var in this case “SKUTARGET”
<product active="1" on_sale="0" discountable="0">
<sku>SKUTARGET</sku>
<name><![CDATA[sdfsdf (NET)]]></name>
<description><![CDATA[agag adgsgsdg asdgsdg]]></description>
<keywords></keywords>
<price>9.000000</price>
<stock_quantity>35</stock_quantity>
<reorder_quantity>0</reorder_quantity>
<height>0.000000</height>
<length>0.000000</length>
<diameter>0.000000</diameter>
<weight>0.000000</weight>
<color>Black</color>
<material>PVC</material>
<barcode>883045010070</barcode>
<release_date>2008-11-10</release_date>
<images>
<image>/sdssd/sdfsd.jpg</image>
<image>/AL10sdfsds07XO/sdfsd.jpg</image>
</images>
<categories>
<category code="166" video="0" parent="172">sd & Sexy sdf</category>
<category code="172" video="0" parent="">sd & dddsdsds</category>
<category code="641" video="0" parent="172">sdfsdf Costume sdfsdfsdf</category>
</categories>
<manufacturer code="AL" video="0">sdfsdf sdfs</manufacturer>
<type code="LI" video="0">sdfsd</type>
</product>
Here is the code I’m working with at the moment
<?php
ob_start();
?>
<?xml version="1.0" encoding="UTF-8"?>
<products>
<product active="1" on_sale="0" discountable="0">
<sku>SKUTARGET</sku>
<name><![CDATA[sdfsdf (NET)]]></name>
<description><![CDATA[agag adgsgsdg asdgsdg]]></description>
<keywords></keywords>
<price>9.000000</price>
<stock_quantity>35</stock_quantity>
<reorder_quantity>0</reorder_quantity>
<height>0.000000</height>
<length>0.000000</length>
<diameter>0.000000</diameter>
<weight>0.000000</weight>
<color>Black</color>
<material>PVC</material>
<barcode>883045010070</barcode>
<release_date>2008-11-10</release_date>
<images>
<image>/sdssd/sdfsd.jpg</image>
<image>/AL10sdfsds07XO/sdfsd.jpg</image>
</images>
<categories>
<category code="166" video="0" parent="172">sd & Sexy sdf</category>
<category code="172" video="0" parent="">sd & dddsdsds</category>
<category code="641" video="0" parent="172">sdfsdf Costume sdfsdfsdf</category>
</categories>
<manufacturer code="AL" video="0">sdfsdf sdfs</manufacturer>
<type code="LI" video="0">sdfsd</type>
</product>
<product active="1" on_sale="0" discountable="0">
<sku>XXXXXXX</sku>
<name><![CDATA[LEATHER sdfsdf (NET)]]></name>
<description><![CDATA[asdgsdgsd sad sadg asdg asdg asdg asdg asdg asdg asdg asdg asdg asdg asdg]]></description>
<keywords></keywords>
<price>5.000000</price>
<stock_quantity>36</stock_quantity>
<reorder_quantity>0</reorder_quantity>
<height>0.000000</height>
<length>0.000000</length>
<diameter>0.000000</diameter>
<weight>0.000000</weight>
<color>Black</color>
<material>Leather</material>
<barcode>883045300164</barcode>
<release_date>2008-11-10</release_date>
<images>
<image>/AL10sds0XO/sdsdsd.jpg</image>
<image>/sdsds/AL1sd00XOB.jpg</image>
<image>/AL1sdsds00XO/sdsds.jpg</image>
</images>
<categories>
<category code="80" video="0" parent="44">sdgsdgsdg</category>
<category code="181" video="0" parent="172">Sleep & Lounge</category>
</categories>
<manufacturer code="AL" video="0">Allure sdsds</manufacturer>
<type code="LI" video="0">sdsfsdfsd</type>
</product>
</products>
<?php
$xml_str = ob_get_contents();
ob_end_clean();
$tar_sku="SKUTARGET"; // this is the sku of the product block I need to have
$pat= '/^.*(<product *<sku>'.$tar_sku.'</sku>*</product>).*$/is'; // this should match the block with the sku but no other block
$replacement='$1';//This should overwrite everything with that found block.
$returnValue = preg_replace($pat, $replacement, $xml_str);
Any help would be great. Thanks.
Jeremy
[edit]
Here is the test code from the suggestion below. As of yet don’ts work. I was expecting to echo back that string of the xml block with that sku matching. no luck yet.
<?php
error_reporting(E_ALL);
ini_set('display_errors', '1');
umask(0);
$xml_str = <<<EOD
<?xml version="1.0" encoding="UTF-8"?>
<products>
<product active="1" on_sale="0" discountable="0">
<sku>SKUTARGET</sku>
<name><![CDATA[sdfsdf (NET)]]></name>
<description><![CDATA[agag adgsgsdg asdgsdg]]></description>
<keywords></keywords>
<price>9.000000</price>
<stock_quantity>35</stock_quantity>
<reorder_quantity>0</reorder_quantity>
<height>0.000000</height>
<length>0.000000</length>
<diameter>0.000000</diameter>
<weight>0.000000</weight>
<color>Black</color>
<material>PVC</material>
<barcode>883045010070</barcode>
<release_date>2008-11-10</release_date>
<images>
<image>/sdssd/sdfsd.jpg</image>
<image>/AL10sdfsds07XO/sdfsd.jpg</image>
</images>
<categories>
<category code="166" video="0" parent="172">sd & Sexy sdf</category>
<category code="172" video="0" parent="">sd & dddsdsds</category>
<category code="641" video="0" parent="172">sdfsdf Costume sdfsdfsdf</category>
</categories>
<manufacturer code="AL" video="0">sdfsdf sdfs</manufacturer>
<type code="LI" video="0">sdfsd</type>
</product>
<product active="1" on_sale="0" discountable="0">
<sku>XXXXXXX</sku>
<name><![CDATA[LEATHER sdfsdf (NET)]]></name>
<description><![CDATA[asdgsdgsd sad sadg asdg asdg asdg asdg asdg asdg asdg asdg asdg asdg asdg]]></description>
<keywords></keywords>
<price>5.000000</price>
<stock_quantity>36</stock_quantity>
<reorder_quantity>0</reorder_quantity>
<height>0.000000</height>
<length>0.000000</length>
<diameter>0.000000</diameter>
<weight>0.000000</weight>
<color>Black</color>
<material>Leather</material>
<barcode>883045300164</barcode>
<release_date>2008-11-10</release_date>
<images>
<image>/AL10sds0XO/sdsdsd.jpg</image>
<image>/sdsds/AL1sd00XOB.jpg</image>
<image>/AL1sdsds00XO/sdsds.jpg</image>
</images>
<categories>
<category code="80" video="0" parent="44">sdgsdgsdg</category>
<category code="181" video="0" parent="172">Sleep & Lounge</category>
</categories>
<manufacturer code="AL" video="0">Allure sdsds</manufacturer>
<type code="LI" video="0">sdsfsdfsd</type>
</product>
</products>
EOD;
$tar_sku="SKUTARGET"; // this is the sku of the product block I need to have
$pattern = "~<product .*?<sku>$tar_sku</sku>.*?</product>~is";
$returnValue = preg_match($pattern,$xml_str);
echo '--'.$returnValue[0];
Do not use a regex to parse XML. If your concern is memory usage, using a regex will consume much more memory than incremental parsing. Since a regex can only operate on a string, you will need at least 100MB of memory just to hold the file string before you can do anything with it. If you use an incremental XML parser, you can use less memory than the size of the file.
The right tool for this job is
XMLReader.tl;dr
There are two
XMLReaderparsing implementations in this answer:getmatchingproducts_xml_expand()orgetmatchingproducts_xml_noexpand()functions returns a list of all matched products. Memory usage depends on how many matching SKU products are in the source xml.ProductMatcherclass is an Iterator (can be used inforeach) that will return matched products incrementally as either a string,DOMDocument, orSimpleXMLElement. It uses about 1MB of memory no matter how big your source XML is or how many products match.Test File
I created a 120 MB sample file using the format you created. This is the creation code:
Memory and Timing Function
Regex vs XMLReader
Finally I tested these functions. The first two use the regexes suggested by other answers, and the third one uses
XMLReader.Finally I saved all this in a file and ran it on my dual-core, 8GB system. (Numbers are peak memory, final memory, and seconds per iteration. “Found Products” is just to verify the correct number of products matched.)
You’ll notice that the regex methods could not even run without exhausting available memory! Further, the
XMLReadermethods (in addition to actually parsing XML correctly), used less memory than the size of the file. I’m willing to bet money that most of thegetmatchingproducts_xml_expandmemory is the$matchedproductsarray, too, and not from parsing. You can cut down memory usage even further by wrapping the parser function in an class so you can retrieve one match at a time.The advantage of using a Regex, though, is that it’s much faster. Here’s another try, raising the memory limit to 1GB:
All of that speed comes from ignoring the rules of XML parsing and treating it as a string. (Interestingly, the fact that the whole file is in memory doesn’t affect
XMLReader‘s speed, only its memory usage.)If you need fast access and low memory usage, you need some kind of indexing or database. You can create a flat-file db using sqlite, sqlite3, dbm and load it with products keyed by SKU using
XMLReader. Then instead of reading the XML file, load the xml string for that product from the db.Just for kicks, I tried an
XMLReaderparsing method that didn’t use expansion, to see if I could save time or memory. The difference was negligible, though, and the code much less clear.Returning Parse Results Incrementally
Yet another implementation. This is probably as efficient as this can get. It parses the 120MB test file using less than 1MB of memory.
Results:
Expanded example usage: