Okay, this should be really simple, but I have searched all over for the answer and also read the following thread:
How do I find the length of a Unicode string in Perl?
It does not help me. I know how to get Perl to treat a string constant as UTF-8 and return the right number of chars (instead of bytes) but somehow it doesn’t work when Perl receives the string via my AJAX call.
Below, I am posting the three Greek letters Alpha, Beta and Omega in unicode. Perl tells me length is 6 (bytes) when it should tell me only 3 (chars). How do I get the correct char count?
#!/usr/bin/perl
use strict;
if ($ENV{CONTENT_LENGTH}) {
binmode (STDIN, ":utf8");
read (STDIN, $_, $ENV{CONTENT_LENGTH});
s{%([a-fA-F0-9]{2})}{ pack ('C', hex ($1)) }eg;
print "Content-Type: text/html; charset=UTF-8\n\nReceived: $_ (".length ($_)." chars)";
exit;
}
print "Content-Type: text/html; charset=UTF-8\n\n";
print qq[<html><head><script>
var oRequest;
function MakeRequest () {
oRequest = new XMLHttpRequest();
oRequest.onreadystatechange = zxResponse;
oRequest.open ('POST', '/test/unicode.cgi', true);
oRequest.send (encodeURIComponent (document.oForm.oInput.value));
}
function zxResponse () {
if (oRequest.readyState==4 && oRequest.status==200) {
alert (oRequest.responseText);
}
}
</script></head><body>
<form name="oForm" method="POST">
<input type="text" name="oInput" value="αβΩ">
<input type="button" value="Ajax Submit" onClick="MakeRequest();">
</form>
</body></html>
];
By the way, the code is intentially simplified (I know how to make a cross-browser AJAX call, etc.) and using the CGI Perl module is not an option.
For a “native” way to accomplish this, you can convert as you copy with this method:
Set the mode on an in memory file to the mode desired and read from that. This will make the conversion as the characters are read.
If you want to convert the encoding in place, you can use this:
You should not shy away from
Encodehowever.