The XML parser of PHP calls the default handler function twice when it encounters a special character in a string and therefore splits the string. I’ve tried to solve it using different encodings on the XML header as well in the PHP code, but it still splits the string:
$parser = xml_parser_create();
xml_parser_set_option($parser, XML_OPTION_TARGET_ENCODING, "ISO-8859-1");
xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
xml_set_element_handler($parser, "startTag", "endTag");
xml_set_default_handler($parser, 'defaultHandler');
function startTag($p, $name, $attributes)
{
}
function endTag($p, $name)
{
}
function defaultHandler($parser, $data)
{
if(strlen(trim($data)) > 0)
echo '[' . $data . ']' . '<br />';
}
Example of the XML:
<variable name="GZH29" type="integer">
<label>This is a small test with a special ë character. Let's try an ë character too</label>
</variable>
One would expect:
[This is a small test with a special ë character. Let's try an ë character too]
But the result is
[This is a small test with a special ]
[ë character. Let's try an ë character too]
I would like not to have the line splitted, so any idea what the solution is?
The
xml_parserdoes create multiple events here for a reason I didn’t finally understood fully, I think this is because of the encoding auto-detection.You can deal with that by creating your own parser class. This is generally useful anyway, not only in this case. But for this case it’s especially so that you can put together the text of the label which get’s distributed over multiple events.
The basic work is making the callback functions public function of a class, and register these functions then.
Then each time the
labeltag opens, a temporary store is reset. When text appears, it’s added to that temporary store. If thelabeltag then closes, you can pass this text to a new “event” this time the function you’re looking for with it’s text: