I have the following script which behaves differently on two different Perl installations I have. One is Perl 5.8.5 and the other is Perl 5.8.8.
Here is the script:
#!/usr/bin/perl
use FindBin(qw($Bin));
use lib $Bin;
use lib "$Bin/../lib";
use XML::LibXML;
use strict; # quote strings, declare variables
use warnings; # on by default
use warnings qw(FATAL utf8); # fatalize encoding glitches
use open qw(:std :utf8); # undeclared streams in UTF-8
my $xml =<<EOS;
<?xml version="1.0" encoding="UTF8"?>
<foo>Привет, мир!</foo>
EOS
my $parser = new XML::LibXML;
my $doc = '';
eval { $doc = $parser->parse_string($xml); };
if ($@) {
die "Error: $@";
}
my $root = $doc->getDocumentElement();
print "XML after parsing: ", $root->toString(), "\n";
On my 5.8.8 Perl installation I get:
XML after parsing: <foo>Привет, мир!</foo>
On my 5.8.5 Perl installation I get:
XML after parsing: <foo>Привет, мир!</foo>
I want my 5.8.5 installation to behave like the 5.8.8 one in this regard. Is this a matter of just upgrading my Perl, or setting some special compilation flag?
First of all, both outputs are equivalent. XML::LibXML is free to generate either one, and it shouldn’t matter to the receiving parser. Of course, XML is suppose to be human readable, and this is probably what concerns you.
No, XML::LibXML does not have an option to control which characters it escapes. In fact, I’ve only known it to escape only when needed, which is the first behaviour.
No need to upgrade Perl. Upgrading XML::LibXML or libxml2 (the underlying library used by XML::LibXML) will do the trick.
Off-topic tips:
I presume your source code is encoded using UTF-8? If so, I would add
use utf8;to let Perl know that. If you do, you’ll need to changeto
Using
instead of
will prevent Perl from messing with your XML (prevent interpolation and interpretation of
\sequences).