Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 870467
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T10:28:51+00:00 2026-05-15T10:28:51+00:00

Part 3 ( Part 2 is here ) ( Part 1 is here )

  • 0

Part 3 (Part 2 is here) (Part 1 is here)

Here is the perl Mod I’m using: Unicode::String

How I’m calling it:

print "Euro: ";
print unicode_encode("€")."\n";
print "Pound: ";
print unicode_encode("£")."\n";

would like it to return this format:

€ # Euro
£ # Pound

The function is below:

sub unicode_encode {

    shift() if ref( $_[0] );
    my $toencode = shift();
    return undef unless defined($toencode);

    print "Passed: ".$toencode."\n";

    Unicode::String->stringify_as("utf8");
    my $unicode_str = Unicode::String->new();
    my $text_str    = "";
    my $pack_str    = "";

    # encode Perl UTF-8 string into latin1 Unicode::String
    #  - currently only Basic Latin and Latin 1 Supplement
    #    are supported here due to issues with Unicode::String .
    $unicode_str->latin1($toencode);

    print "Latin 1: ".$unicode_str."\n";

    # Convert to hex format ("U+XXXX U+XXXX ")
    $text_str = $unicode_str->hex;

    # Now, the interesting part.
    # We must search for the (now hex-encoded)
    #       Unicode escape sequence.
    my $pattern =
'U\+005[C|c] U\+0058 U\+00([0-9A-Fa-f])([0-9A-Fa-f]) U\+00([0-9A-Fa-f])([0-9A-Fa-f]) U\+00([0-9A-Fa-f])([0-9A-Fa-f]) U\+00([0-9A-Fa-f])([0-9A-Fa-f])';

    # Replace escapes with entities (beginning of string)
    $_ = $text_str;
    if (/^$pattern/) {
        $pack_str = pack "H8", "$1$2$3$4$5$6$7$8";
        $text_str =~ s/^$pattern/\&#x$pack_str/;
    }

    # Replace escapes with entities (middle of string)
    $_ = $text_str;
    while (/ $pattern/) {
        $pack_str = pack "H8", "$1$2$3$4$5$6$7$8";
        $text_str =~ s/ $pattern/\;\&#x$pack_str/;
        $_ = $text_str;
    }

    # Replace "U+"  with "&#x"      (beginning of string)
    $text_str =~ s/^U\+/&#x/;

    # Replace " U+" with ";&#x"     (middle of string)
    $text_str =~ s/ U\+/;&#x/g;

    # Append ";" to end of string to close last entity.
    # This last ";" at the end of the string isn't necessary in most parsers.
    # However, it is included anyways to ensure full compatibility.
    if ( $text_str ne "" ) {
        $text_str .= ';';
    }

    return $text_str;
}

I need to get the same output but need to Support Latin-9 characters as well, but the Unicode::String is limited to latin1. any thoughts on how I can get around this?

I have a couple of other questions and think I have a somewhat understanding of Unicode and Encodings but having time issues as well.

Thanks to anyone who helps me out!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T10:28:51+00:00Added an answer on May 15, 2026 at 10:28 am

    As you have been told already, Unicode::String is not an appropriate choice of module. Perl ships with a module called ‘Encode’ which can do everything you need.

    If you have a character string in Perl like this:

    my $euro = "\x{20ac}";
    

    You can convert it to a string of bytes in Latin-9 like this:

    my $bytes = encode("iso8859-15", $euro);
    

    The $bytes variable will now contain \xA4.

    Or you can have Perl automatically convert it out output to a filehandle like this:

    binmode(STDOUT, ":encoding(iso8859-15)");
    

    You can refer to the documentation for the Encode module. And also, PerlIO describes the encoding layer.

    I know you are determined to ignore this final piece of advice but I’ll offer it one last time. Latin-9 is a legacy encoding. Perl can quite happily read Latin-9 data and convert it to UTF-8 on the fly (using binmode). You should not be writing more software that generates Latin-9 data you should be migrating away from it.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Here is part of my web.config for my WCF service: <bindings> <basicHttpBinding> <binding name=sslBinding>
In my JSF/Facelets app, here's a simplified version of part of my form: <h:form
I'm using Perl/Tk to build the GUI for an application. I plan on adding
So I wrote some perl that would parse results returned from the Amazon Web
Part of my everyday work is maintaining and extending legacy VB6 applications. A common
Part of a new product I have been assigned to work on involves server-side
Part of the install for an app I am responsible for, compiles some C
Part of our java application needs to run javascript that is written by non-developers.
Part of the series of controls I am working on obviously involves me lumping
Part of the setup routine for the product I'm working on installs a database

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.