I have a SimpleXML Object made from merging multiple XMLs from PubMed (snippet below) but there is repetition from the merge. How can I compare all first child arrays – array[][0], array[][1] etc – and discard any duplicates?
I though perhaps serialising was the answer but you can’t serialise a SimpleXML Object afaik..
I’m not sure where to start?
Array
(
[0] => Array
(
[title] => SimpleXMLElement Object
(
[0] => Superstructure of the centromeric complex of TubZRC plasmid partitioning systems.
)
[link] => SimpleXMLElement Object
(
[@attributes] => Array
(
[Version] => 1
)
[0] => 23010931
)
[author] => Aylett, CH., Löwe, J.
[journal] => SimpleXMLElement Object
(
[0] => Proc. Natl. Acad. Sci. U.S.A.
)
[pubdate] => 2012-9-27
[day] => SimpleXMLElement Object
(
[0] => 25
)
[month] => SimpleXMLElement Object
(
[0] => Sep
)
[year] => SimpleXMLElement Object
(
[0] => 2012
)
)
[1] => Array
(
[title] => SimpleXMLElement Object
(
[0] => Superstructure of the centromeric complex of TubZRC plasmid partitioning systems.
)
[link] => SimpleXMLElement Object
(
[@attributes] => Array
(
[Version] => 1
)
[0] => 23010931
)
[author] => Aylett, CH., Löwe, J.
[journal] => SimpleXMLElement Object
(
[0] => Proc. Natl. Acad. Sci. U.S.A.
)
[pubdate] => 2012-9-27
[day] => SimpleXMLElement Object
(
[0] => 25
)
[month] => SimpleXMLElement Object
(
[0] => Sep
)
[year] => SimpleXMLElement Object
(
[0] => 2012
)
)
Alternatively it could be done at the initial XML merge stage – I use the code below at the moment if anyone can suggest how to modify it to remove duplicates?
function simplexml_merge (SimpleXMLElement &$xml1, SimpleXMLElement $xml2) {
$dom1 = new DomDocument();
$dom2 = new DomDocument();
$dom1->loadXML($xml1->asXML());
$dom2->loadXML($xml2->asXML());
$xpath = new domXPath($dom2);
$xpathQuery = $xpath->query('/*/*');
for ($i = 0; $i < $xpathQuery->length; $i++) {
$dom1->documentElement->appendChild(
$dom1->importNode($xpathQuery->item($i), true));
}
$xml1 = simplexml_import_dom($dom1);
}
$xml1 = new SimpleXMLElement($search1);
$xml2 = new SimpleXMLElement($search2);
simplexml_merge($xml1, $xml2);
Thanks.
…
…
For clarity – here’s the XML source layout that I am importing into SimpleXML – each PubmedArticle is one “element” I am interested in comparing and ensuring there are no duplicates –
<xml...>
<Document>
<PubmedArticle>
<MedlineCitation>
<PMID version="1">xxx</PMID>
...
</MedlineCitation>
...
</PubmedArticle>
<PubmedArticle>
<MedlineCitation>
<PMID version="1">xxx</PMID>
...
</MedlineCitation>
...
</PubmedArticle>
etc
</Document>
</xml>
The PMID node is unique so can be used to check for duplicates.
…
…
Using the link from @Gordon – I know use:
//Get my source XML
$xml1 = new SimpleXMLElement($search1);
$xml2 = new SimpleXMLElement($search2);
//Run through $xml1 and build a query based on it's PMIDs
$query = array();
foreach ($xml1->PubmedArticle as $paper) {
$query[] = sprintf('(PMID != %s)',$paper->MedlineCitation->PMID);
}
$query = implode('and', $query);
//Run through $xml2 and get node which don't have PMID matching $xml1
foreach ($xml2->xpath(sprintf('PubmedArticle/MedlineCitation[%s]', $query)) as $paper) {
echo $paper->asXml();
}
However I still have one problem – getting the output merged.
The output of $xml2 is missing the <PubmedArticle> node around each ‘match’ for a start. Then I presume I can use the same merge code (above) to do the merge.
Can you point me in the right direction?
Decided to follow @Gordon’s line as it kept it XML. Eventually got it all working:
This would be easy to adapt to other data sources – just modify the
dedupeXMLfunction to match the data structure of your XML.