i want to split a string with regex, then create a dom element where i found the match, and do that till string ends.
given a string;
$str="hi there! [1], how are you? [2]";
DESIRED RESULT:
<sentence>
hi there! <child1>1</child1>, how are you? <child2>2</child2>
</sentence>
i am using php dom -> $dom = new DOMDocument('1.0'); ...
to create root; (this might not have anything to do, but some people complains about no-effort and stuff..)
$root= $dom->createElement('sentence', null);
$root= $dom->appendChild($root);
$root->setAttribute('attr-1', 'value-1');
i used several approaches like, and some with preg-split;
$counter=1;
$pos = preg_match('/\[([1-9][0-9]*)\]/', $str);
if ($pos == true) {
$substr=$dom->createElement('child', $counter);
$root->appendChild($substr);
$counter++;
}
i know that code is not worthy, but again to show it is not a treat..
any help is appreciated..
Your original code is not that far of. However you need to make the regular expression match as well the text you want to add (and you need a textnode for that). After each match you need to advance the offset as well, where to continue to match:
The
whileloop is comparable to theifyou had, just turning it into a loop. Also the textnodes are added if there is some text matched (e.g. you could have [1][2] in your string so the text would be empty. Output of this example:Edit After playing with this a bit I came to the conclusion that you might want to divide the problem. One part is to parse the string, and the other part is to actually insert nodes (e.g. textnode on text and elementnode if it’s the number). Starting from behind, this immediately looks practical, second part first:
In this example, we don’t care about the parsing at all. We either get a text or a number (
$type) and we can decide upon it to either insert the textnode or the element. So however the parsing of the string is done, this code will always work. If there is an issue with it (e.g. the$counteris not interesting any longer), it would have nothing to do with the parsing/tokenization of the string.The parsing itself has been encapsulated into an
IteratorcalledTokenizer. It contains everything to break the string apart into text and number elements. It deals with all the details like what happens if there is some text after the last number and so on:That done has split the two problems apart from each other. Instead of an iterator class it is possible as well to create an array of arrays or similar, but I found the iterator more useful so I quickly wrote one.
Again this example outputs the XML at the end, so here it is exemplary. Note that I’ve added some text after the last element: