I have an XML file which contains text with some very simple layout constructs:
<?xml version='1.0'?>
<page>
<section>
<header>Header</header>
<par>Some paragraph</par>
<par>Another paragraph with <emph>formatting</emph></par>
</section>
</page>
In PHP then I read this file using SimpleXML (Note that I intentionally strip other tags!):
$page = file_get_contents("page.xml");
if ($page) {
$stripped = strip_tags($page, "<?xml><page><section><header><par><emph>");
$xml = new SimpleXMLElement($stripped);
}
Now I would like to iterate over the XML elements and print them in order as HTML for my website. The final result should be the following snippet:
<h1>Header</h1>
<p>Some paragraph
<p>Another paragraph with <i>formatting</i>
I’ve noodled through SimpleXML and XPath and tried to figure out how I can iterate over the XML tree in order so that I can digest the original XML file into HTML output. I can produce a somewhat desired result but the <emph></emph> is just gone; how do I descent further into the tree? My code so far:
foreach ($xml->section as $s) {
echo "<h1>" . $s->header . "</h1>";
foreach ($s->par as $p) {
echo "<p>" . $p;
// Do some magic here to ensure <emph> tags are recognized and responded to properly.
}
}
Any hints and pointers are appreciated! Thanks 🙂
Well, without an answer I just had to noodle myself 🙂 So here is what I did and it worked out just fine.
Turned out that the SimpleXML thing didn’t cut it, so I used the XMLReader:
Then I manually parsed the XML string, jumped from element to element and acted upon each of them:
You get the drift. I basically had to bounce through the XML structure myself and, dependent on the element type, handle attributes and nodes of elements manually.
In fact, this is a two-step process. What you see here assumes a valid XML document. I also have a validator that runs before the above code, and which makes sure that the correct elements are nested properly and that the given XML is “well formed” as per my own definitions of nesting, attributes, whatnot. The validator operates after the exact same principle.
Hope this helps.