Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6325743
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T16:57:13+00:00 2026-05-24T16:57:13+00:00

How to deal with invalid UTF-8 sequences in data from external file / external

  • 0

How to deal with invalid UTF-8 sequences in data from external file / external command, which data is used to generate HTML (in a Perl web app)?

Currently I am running to_utf8() on each piece of data; said subroutine detects if data is invalid UTF-8, and falls back to ‘latin1’ encoding:

use utf8;
use Encoding;
binmode STDOUT, ':utf8';

sub to_utf8 {
    my $str = shift;
    return undef unless defined $str;
    if (utf8::valid($str)) {
        utf8::decode($str);
        return $str;
    } else {
        return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
    }
}

Please correct me if this code is incorrect.

The (fragment of) recommended setup in Perl Unicode Essentials from Tom Christiansen’s Materials for OSCON 2011 is

use utf8;
use open qw( :encoding(UTF-8) :std );

How to get something similar to what I have using something like above? I’d prefer automatic handling of Unicode, rather than having to remember to mark all output strings from external commands and files with to_utf8().

The data is from external files, or output from external commands, and it should be in UTF-8, but because of user errors it sometimes it isn’t.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T16:57:14+00:00Added an answer on May 24, 2026 at 4:57 pm

    You can write a custom IO layer that does the “magical” decoding.

    Usualy IO layers (like :utf8) are written in XS, but the core module PerlIO::via (see http://search.cpan.org/perldoc?PerlIO::via) allows you to use perl code for that.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm currently puling data from an external XML file and converting it to JSON
I need to escape special characters in an invalid XML file which is about
I have a great deal of data to keep synchronized over 4 or 5
I'm writing a wrapper layer to be used with mingw which provides the application
I am receiving a response from a XML request and I need to deal
I have a WPF application which saves its data to XML files in a
I'm more used to xml documents and tools, but I need to deal with
In VC++ 2003, I could just save the source file as UTF-8 and all
How do you deal with source control management and automated deployment (configuration management) of
I have to deal with text files in a motley selection of formats. Here's

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.