I am trying to use perl’s YAML::XS module on unicode letters and it doesn’t seem working the way it should.
I write this in the script (which is saved in utf-8)
use utf8;
binmode STDOUT, ":utf8";
my $hash = {č => "ř"}; #czech letters with unicode codes U+010D and U+0159
use YAML::XS;
my $s = YAML::XS::Dump($hash);
print $s;
Instead of something sane, -: Å is printed. According to this link, though, it should be working fine.
Yes, when I YAML::XS::Load it back, I got the correct strings again, but I don’t like the fact the dumped string seems to be in some wrong encoding.
Am I doing something wrong? I am always unsure about unicode in perl, to be frank…
clarification: my console supports UTF-8. Also, when I print it to file, opened with utf8 handle with open $file, ">:utf8" instead of STDOUT, it still doesn’t print correct utf-8 letters.
Yes, you’re doing something wrong. You’ve misunderstood what the link you mentioned means.
Dump&Loadwork with raw UTF-8 bytes; i.e. strings containing UTF-8 but with the UTF-8 flag off.When you print those bytes to a filehandle with the
:utf8layer, they get interpreted as Latin-1 and converted to UTF-8, producing double-encoded output (which can be read back successfully as long as you double-decode it). You want tobinmode STDOUT, ':raw'instead.Another option is to call utf8::decode on the string returned by
Dump. This will convert the raw UTF-8 bytes to a character string (with the UTF-8 flag on). You can then print the string to a:utf8filehandle.So, either
Or
Likewise, when reading from a file, you want to read in
:rawmode or useutf8::encodeon the string before passing it toLoad.When possible, you should just use
DumpFile&LoadFile, letting YAML::XS deal with opening the file correctly. But if you want to use STDIN/STDOUT, you’ll have to deal withDump&Load.