Your example looks close. What if you try: base.Page.PreRender -=…

Question

0

Editorial Team

Asked: May 15, 20262026-05-15T10:28:51+00:00 2026-05-15T10:28:51+00:00

Part 3 ( Part 2 is here ) ( Part 1 is here )

0

Part 3 (Part 2 is here) (Part 1 is here)

Here is the perl Mod I’m using: Unicode::String

How I’m calling it:

print "Euro: ";
print unicode_encode("€")."\n";
print "Pound: ";
print unicode_encode("£")."\n";

would like it to return this format:

&#x20AC; # Euro
&#x00A3; # Pound

The function is below:

sub unicode_encode {

    shift() if ref( $_[0] );
    my $toencode = shift();
    return undef unless defined($toencode);

    print "Passed: ".$toencode."\n";

    Unicode::String->stringify_as("utf8");
    my $unicode_str = Unicode::String->new();
    my $text_str    = "";
    my $pack_str    = "";

    # encode Perl UTF-8 string into latin1 Unicode::String
    #  - currently only Basic Latin and Latin 1 Supplement
    #    are supported here due to issues with Unicode::String .
    $unicode_str->latin1($toencode);

    print "Latin 1: ".$unicode_str."\n";

    # Convert to hex format ("U+XXXX U+XXXX ")
    $text_str = $unicode_str->hex;

    # Now, the interesting part.
    # We must search for the (now hex-encoded)
    #       Unicode escape sequence.
    my $pattern =
'U\+005[C|c] U\+0058 U\+00([0-9A-Fa-f])([0-9A-Fa-f]) U\+00([0-9A-Fa-f])([0-9A-Fa-f]) U\+00([0-9A-Fa-f])([0-9A-Fa-f]) U\+00([0-9A-Fa-f])([0-9A-Fa-f])';

    # Replace escapes with entities (beginning of string)
    $_ = $text_str;
    if (/^$pattern/) {
        $pack_str = pack "H8", "$1$2$3$4$5$6$7$8";
        $text_str =~ s/^$pattern/\&#x$pack_str/;
    }

    # Replace escapes with entities (middle of string)
    $_ = $text_str;
    while (/ $pattern/) {
        $pack_str = pack "H8", "$1$2$3$4$5$6$7$8";
        $text_str =~ s/ $pattern/\;\&#x$pack_str/;
        $_ = $text_str;
    }

    # Replace "U+"  with "&#x"      (beginning of string)
    $text_str =~ s/^U\+/&#x/;

    # Replace " U+" with ";&#x"     (middle of string)
    $text_str =~ s/ U\+/;&#x/g;

    # Append ";" to end of string to close last entity.
    # This last ";" at the end of the string isn't necessary in most parsers.
    # However, it is included anyways to ensure full compatibility.
    if ( $text_str ne "" ) {
        $text_str .= ';';
    }

    return $text_str;
}

I need to get the same output but need to Support Latin-9 characters as well, but the Unicode::String is limited to latin1. any thoughts on how I can get around this?

I have a couple of other questions and think I have a somewhat understanding of Unicode and Encodings but having time issues as well.

Thanks to anyone who helps me out!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T10:28:51+00:00

As you have been told already, Unicode::String is not an appropriate choice of module. Perl ships with a module called ‘Encode’ which can do everything you need.

If you have a character string in Perl like this:

my $euro = "\x{20ac}";

You can convert it to a string of bytes in Latin-9 like this:

my $bytes = encode("iso8859-15", $euro);

The $bytes variable will now contain \xA4.

Or you can have Perl automatically convert it out output to a filehandle like this:

binmode(STDOUT, ":encoding(iso8859-15)");

You can refer to the documentation for the Encode module. And also, PerlIO describes the encoding layer.

I know you are determined to ignore this final piece of advice but I’ll offer it one last time. Latin-9 is a legacy encoding. Perl can quite happily read Latin-9 data and convert it to UTF-8 on the fly (using binmode). You should not be writing more software that generates Latin-9 data you should be migrating away from it.

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions