i want to split a string with regex, then create a dom element where

Question

0

Asked: May 31, 20262026-05-31T16:09:43+00:00 2026-05-31T16:09:43+00:00

i want to split a string with regex, then create a dom element where

0

i want to split a string with regex, then create a dom element where i found the match, and do that till string ends.
given a string;

$str="hi there! [1], how are you? [2]";

DESIRED RESULT:

<sentence>
hi there! <child1>1</child1>, how are you? <child2>2</child2>
</sentence>

i am using php dom -> $dom = new DOMDocument('1.0'); ...

to create root; (this might not have anything to do, but some people complains about no-effort and stuff..)

        $root= $dom->createElement('sentence', null);
        $root= $dom->appendChild($root);
        $root->setAttribute('attr-1', 'value-1');

i used several approaches like, and some with preg-split;

$counter=1;
$pos = preg_match('/\[([1-9][0-9]*)\]/', $str);
    if ($pos == true) {
    $substr=$dom->createElement('child', $counter);
    $root->appendChild($substr);
    $counter++;
    }

i know that code is not worthy, but again to show it is not a treat..

any help is appreciated..

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T16:09:44+00:00

Your original code is not that far of. However you need to make the regular expression match as well the text you want to add (and you need a textnode for that). After each match you need to advance the offset as well, where to continue to match:

$str = "hi there! [1], how are you? [2]";

$dom = new DOMDocument('1.0');
$root= $dom->createElement('sentence', null);
$root= $dom->appendChild($root);
$root->setAttribute('attr-1', 'value-1'); # ...

$counter = 0;
$offset = 0;
while ($pos = preg_match('/(.*?)\[([1-9][0-9]*)\]/', $str, $matches, NULL, $offset)) {
    list(, $text, $number) = $matches;
    if (strlen($text)) {
        $root->appendChild($dom->createTextNode($text));
    }
    if (strlen($number)) {
        $counter++;
        $root->appendChild($dom->createElement("child$counter", $number));

    }
    $offset += strlen($matches[0]);
}

The while loop is comparable to the if you had, just turning it into a loop. Also the textnodes are added if there is some text matched (e.g. you could have [1][2] in your string so the text would be empty. Output of this example:

<?xml version="1.0"?>
<sentence attr-1="value-1">
  hi there! <child1>1</child1>, how are you? <child2>2</child2>
</sentence>

Edit After playing with this a bit I came to the conclusion that you might want to divide the problem. One part is to parse the string, and the other part is to actually insert nodes (e.g. textnode on text and elementnode if it’s the number). Starting from behind, this immediately looks practical, second part first:

$dom = new DOMDocument('1.0');
$root = $dom->createElement('sentence', null);
$root = $dom->appendChild($root);
$root->setAttribute('attr-1', 'value-1'); # ...

$str = "hi there! [1], how are you? [2] test";

$it = new Tokenizer($str);
$counter = 0;
foreach ($it as $type => $string) {
    switch ($type) {
        case Tokenizer::TEXT:
            $root->appendChild($dom->createTextNode($string));
            break;

        case Tokenizer::NUMBER:
            $counter++;
            $root->appendChild($dom->createElement("child$counter", $string));
            break;

        default:
            throw new Exception(sprintf('Invalid type %s.', $type));
    }
}

echo $dom->saveXML();

In this example, we don’t care about the parsing at all. We either get a text or a number ($type) and we can decide upon it to either insert the textnode or the element. So however the parsing of the string is done, this code will always work. If there is an issue with it (e.g. the $counter is not interesting any longer), it would have nothing to do with the parsing/tokenization of the string.

The parsing itself has been encapsulated into an Iterator called Tokenizer. It contains everything to break the string apart into text and number elements. It deals with all the details like what happens if there is some text after the last number and so on:

class Tokenizer implements Iterator
{
    const TEXT = 1;
    const NUMBER = 2;
    private $offset;
    private $string;
    private $fetched;

    public function __construct($string)
    {
        $this->string = $string;
    }

    public function rewind()
    {
        $this->offset = 0;
        $this->fetch();
    }

    private function fetch()
    {
        if ($this->offset >= strlen($this->string)) {
            return;
        }
        $result = preg_match('/\[([1-9][0-9]*)\]/', $this->string, $matches, PREG_OFFSET_CAPTURE, $this->offset);
        if (!$result) {
            $this->fetched[] = array(self::TEXT, substr($this->string, $this->offset));
            $this->offset = strlen($this->string);
            return;
        }
        $pos = $matches[0][1];
        if ($pos != $this->offset) {
            $this->fetched[] = array(self::TEXT, substr($this->string, $this->offset, $pos - $this->offset));
        }
        $this->fetched[] = array(self::NUMBER, $matches[1][0]);
        $this->offset = $pos + strlen($matches[0][0]);
    }

    public function current()
    {
        list(, $current) = current($this->fetched);
        return $current;
    }

    public function key()
    {
        list($key) = current($this->fetched);
        return $key;
    }

    public function next()
    {
        array_shift($this->fetched);
        if (!$this->fetched) $this->fetch();
    }

    public function valid()
    {
        return (bool)$this->fetched;
    }
}

That done has split the two problems apart from each other. Instead of an iterator class it is possible as well to create an array of arrays or similar, but I found the iterator more useful so I quickly wrote one.

Again this example outputs the XML at the end, so here it is exemplary. Note that I’ve added some text after the last element:

<?xml version="1.0"?>
<sentence attr-1="value-1">
  hi there! <child1>1</child1>, how are you? <child2>2</child2> test
</sentence>

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i want to split a string with regex, then create a dom element where

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply