Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8484943
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T20:30:29+00:00 2026-06-10T20:30:29+00:00

To my horror I’ve just found out that chr doesn’t work with Unicode, although

  • 0

To my horror I’ve just found out that chr doesn’t work with Unicode, although it does something. The man page is all but clear

Returns the character represented by that NUMBER in the character set. For example, chr(65)” is “A” in either ASCII or Unicode, and chr(0x263a) is a Unicode smiley face.

Indeed I can print a smiley using

perl -e 'print chr(0x263a)'

but things like chr(0x00C0) do not work. I see that my perl v5.10.1 is a bit ancient, but when I paste various strange letters in the source code, everything’s fine.

I’ve tried funny things like use utf8 and use encoding 'utf8', I haven’t tried funny things like use v5.12 and use feature 'unicode_strings' as they don’t work with my version, I was fooling around with Encode::decode to find out that I need no decoding as I have no byte array to decode. I’ve read much more documentation than ever before, and found quite a few interesting things but nothing helpful. It looks like a sort of the Unicode Bug but there’s no usable solution given. Moreover I don’t care about the whole string semantics, all I need is a trivial function.

So how can I convert a number into a string consisting of the single character corresponding with it, so that for example real_chr(0xC0) eq 'À' holds?


The first answer I’ve got explains quite everything about IO, but I still don’t understand why

#!/usr/bin/perl -w
use strict;
use utf8;
use encoding 'utf8';

print chr(0x00C0) eq 'À' ? 'eq1' : 'ne1', " - ", chr(0x263a) eq '☺' ? 'eq1' : 'ne1', "\n";

print 'À' =~ /\w/ ? "match1" : "no_match1", " - ", chr(0x00C0) =~ /\w/ ? "match2" : "no_match2", "\n";

prints

ne1 - eq1
match1 - no_match2

It means that the manually entered 'À' differs from chr(0x00C0). Moreover, the former is a word constituent character (correct!) while the latter is not (but should be!).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T20:30:31+00:00Added an answer on June 10, 2026 at 8:30 pm

    First,

    perl -le'print chr(0x263A);'
    

    is buggy. Perl even tells you as much:

    Wide character in print at -e line 1.
    

    That doesn’t qualify as “working”. So while they differ in how fail to provide what you want, neither of the following gives you what you want:

    perl -le'print chr(0x263A);'
    
    perl -le'print chr(0x00C0);'
    

    To properly output the UTF-8 encoding of those Unicode code points, you need to tell Perl to encoding the Unicode points with UTF-8.

    $ perl -le'use open ":std", ":encoding(UTF-8)"; print chr(0x263A);'
    ☺
    
    $ perl -le'use open ":std", ":encoding(UTF-8)"; print chr(0x00C0);'
    À
    

    Now on to the “why”.

    File handle can only transmit bytes, so unless you tell it otherwise, Perl file handles expect bytes. That means the string you provide to print cannot contain anything but bytes, or in other words, it cannot contain characters over 255. The output is exactly what you provide:

    $ perl -e'print map chr, 0x00, 0x65, 0xC0, 0xF0' | od -t x1
    0000000 00 65 c0 f0
    0000004
    

    This is useful. This is different then what you want, but that doesn’t make it wrong. If you want something different, you just need to tell Perl what you want.

    By adding an :encoding layer, the handle now expects a string of Unicode characters, or as I call it, “text”. The layer tells Perl how to convert the text into bytes.

    $ perl -e'
       use open ":std", ":encoding(UTF-8)";
       print map chr, 0x00, 0x65, 0xC0, 0xF0, 0x263a;
    ' | od -t x1
    0000000 00 65 c3 80 c3 b0 e2 98 ba
    0000011
    

    Your right that chr doesn’t know or care about Unicode. Like length, substr, ord and reverse, chr implements a basic string function, not a Unicode function. That doesn’t mean it can’t be used to work with text string. As you’ve seen, the problem wasn’t with chr but with what you did with the string after you built it.

    A character is an element of a string, and a character is a number. That means a string is just a sequence of numbers. Whether you treat those numbers as Unicode code points (text), packed IP addresses or temperature measurements is entirely up to you and the functions to which you pass the strings.

    Here are a few example of operators that do assign meaning to the strings they receive as operands:

    • m// expects a string of Unicode code points.
    • connect expects a sequence of bytes that represent a sockaddr_in structure.
    • print with a handle without :encoding expect a sequence of bytes.
    • print with a handle with :encoding expect a sequence of Unicode code points.
    • etc

    So how can I convert a number into a string consisting of the single character corresponding with it, so that for example real_chr(0xC0) eq ‘À’ holds?

    chr(0xC0) eq 'À' does hold. Did you remember to tell Perl you encoded your source code using UTF-8 by using use utf8;? If you didn’t tell Perl, Perl actually sees a two-character string on the RHS.


    Regarding the question you’ve added:

    There are problems with the encoding pragma. I recommend against using it. Instead, use

    use open ':std', ':encoding(UTF-8)';
    

    That’ll fix one of the problems. The other problem you are encountering is with

    chr(0x00C0) =~ /\w/
    

    It’s a known bug that’s intentionally left broken for backwards compatibility reasons. That is, unless you request a more recent version of the language as follows:

    use 5.014;    # use 5.012; *might* suffice.
    

    A workaround that works as far back as 5.8:

    my $x = chr(0x00C0);
    utf8::upgrade($x);
    $x =~ /\w/
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have just written a few unit tests and to my horror it failed.
I'm writing some logging/auditing code that will be running in production (not just when
To my astonishment and horror, I've just encountered the line System.exit(1); in a library
Reading the coding horror, I just came across the FizzBuzz another time. The original
Answering another question , I stumbled upon the man page of a function called
We're all familiar with the horror that is C# event declaration. To ensure thread-safety,
To my great horror, I recently discovered that the SSL/TLS protocol that I until
I discovered the FizzBuzz question today at coding horror . Great article. However, something
I recently tried to install kayako fusion helpdesk (ioncube) and found out a strange
I was reading this article on Coding Horror: http://www.codinghorror.com/blog/2008/04/setting-up-subversion-on-windows.html I went to the downloads

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.