Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 739813
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T08:26:37+00:00 2026-05-14T08:26:37+00:00

I’m running Perl 5.10.0 and Postgres 8.4.3, and strings into a database, which is

  • 0

I’m running Perl 5.10.0 and Postgres 8.4.3, and strings into a database, which is behind a DBIx::Class.

These strings should be in UTF-8, and therefore my database is running in UTF-8. Unfortunatly some of these strings are bad, containing malformed UTF-8, so when I run it I’m getting an exception

DBI Exception: DBD::Pg::st execute failed: ERROR: invalid byte sequence for encoding "UTF8": 0xb5

I thought that I could simply ignore the invalid ones, and worry about the malformed UTF-8 later, so using this code, it should flag and ignore the bad titles.

if(not utf8::valid($title)){
   $title="Invalid UTF-8";
}
$data->title($title);
$data->update();

However Perl seems to think that the strings are valid, but it still throws the exceptions.

How can I get Perl to detect the bad UTF-8?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T08:26:37+00:00Added an answer on May 14, 2026 at 8:26 am

    First off, please follow the documentation – the utf8 module should only be used in the ‘use utf8;’ form to indicate that your source code is UTF-8 instead of Latin-1. Don’t use any of the utf8 functions.

    Perl makes the distinction between bytes and UTF-8 strings. In byte mode, Perl doesn’t know or care what encoding you are using, and will use Latin-1 if you print it. Take for example the Euro sign (€). In UTF-8 this is 3 bytes, 0xE2, 0x82, 0xAC. If you print the length of these bytes, Perl will return 3. Again, it doesn’t care about the encoding. It can be any bytes or any encoding, legal or illegal.

    If you use the Encode module and call Encode::decode("UTF-8', $bytes) you will get a new string which has the so-called UTF8 flag set. Perl now knows your string is in UTF-8, and will return a length of 1.

    The problem that utf8::valid only applies to the second type of string. Your strings are probably in the first form, byte mode, and utf8::valid just returns true for anything in byte form. This is documented in the perldoc.

    The solution is to get Perl to decode your byte strings as UTF-8, and detect any errors. This can be done with FB_CROAK as brian d foy explains:

    my $ustring =
        eval { decode( 'UTF-8', $byte_string, FB_CROAK ) }
        or die "Could not decode string: $@";
    

    You can then catch that error and skip those invalid strings.

    Or if you know your code is mostly UTF-8 with a few invalid sequences here and there, you can use:

    my $ustring = decode( 'UTF-8', $byte_string );
    

    which uses the default mode of FB_DEFAULT, replacing invalid characters with U+FFFD, the Unicode REPLACEMENT CHARACTER (diamond with question mark in it).

    You can then pass the string directly to your database driver in most cases. Some drivers may require you to re-encode the string back to byte form first:

    my $byte_string = encode('UTF-8', $ustring);
    

    There are also regexes online that you can use to check for valid UTF-8 sequences before calling decode (check other Stack Overflow answers). If you use those regexes, you don’t need to do any encoding or decoding.

    Finally, please use UTF-8 rather than utf8 in your calls to decode. The latter is more lax and allows some invalid UTF-8 sequences (such as sequences outside the Unicode range) to be allowed through.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 382k
  • Answers 382k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer @cyberw0lf, check this link how to develop activex invisible component… May 14, 2026 at 10:36 pm
  • Editorial Team
    Editorial Team added an answer You appear to be missing quotes in a couple of… May 14, 2026 at 10:36 pm
  • Editorial Team
    Editorial Team added an answer You can specify SD card when creating AVD. Or you… May 14, 2026 at 10:36 pm

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.