I have a problem with accented letters. For example: I have a tag that

Question

0

Editorial Team

Asked: June 6, 20262026-06-06T02:41:24+00:00 2026-06-06T02:41:24+00:00

I have a problem with accented letters. For example: I have a tag that

0

I have a problem with accented letters.

For example:
I have a tag that contains: “il mio prodotto é molto bello”. However, the output is: “il mio prodotto “

When in xml, and there is an accented letter, the date is cut. I have a xml with:

<?xml version="1.0" encoding="utf-8"?>

Here is my parser code:

<?php
class Content_Handler {
   function Content_Handler(){}
   function start_element($parser, $name, $attrs) {
       global $desc, $names, $link;
       if ($name == "PRODUCT"){
          $zupid = ($attrs["ZUPID"]);
          echo "$zupid<br>";
       }
       if ($name == "DESCRIPTION") { $desc = true;}
       if ($name == "NAME") { $names = true;}
       if ($name == "DEEPLINK") { $link = true;}
   }

   function end_element($parser, $name) {
       if ($name == "PRODUCT") {
          print "<br />";
       }
   }


   function characters($parser, $chars) {
       global $desc, $names, $link;
       if ($desc) { echo $chars."<br>"; $desc = false;} 
       if ($names) { echo $chars."<br>"; $names = false;} 
       if ($link) { echo $chars."<br>"; $link = false;} 
   }
}


$handler = new Content_Handler();
$cat_parser = xml_parser_create("UTF-8");

xml_parser_set_option($cat_parser, XML_OPTION_TARGET_ENCODING, "ISO-8859-1");
xml_set_object($cat_parser, $handler);
xml_set_element_handler($cat_parser, "start_element", "end_element");
xml_set_character_data_handler($cat_parser, "characters");


$file = "my.xml";


if ($file_stream = fopen($file, "r")) {

   while ($data = fread($file_stream, 4096)) {

       $this_chunk_parsed = xml_parse($cat_parser, $data, feof($file_stream));
       if (!$this_chunk_parsed) {
           $error_code = xml_get_error_code($cat_parser);
           $error_text = xml_error_string($error_code);
           $error_line = xml_get_current_line_number($cat_parser);

           $output_text = "Parsing problem at line $error_line: $error_text";
           die($output_text);
       }
   }
} else {

    die("Can't open XML file.");

}
xml_parser_free($cat_parser);

?>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T02:41:25+00:00

This is the normal error when dealing with SAX parsing in what appears to be any language (see previous answers on java and C!).

When you are a parsing SAX events, the Characters function isn’t the entire contents of the element between the start and end tag, it can be called many times, and when you are dealing with accented characters it is.

The full characters contents can only be determined by concatinating the values between a start and end tags.

so for your term ‘”il mio prodotto é molto bello’, characters will be called probably 3 times, with ‘il mio prodotto ‘, ‘é’ and ‘ molto bello’, so you need to concatinate them, not use them as litterals.

Your ‘characters’ function should be more like:

function characters($parser, $chars) {
   global $desc, $names, $link;
   $fullchars .= $chars;
}

with your chars being used and reset in the end_element and start_element.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a problem with accented letters. For example: I have a tag that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply