Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7578441
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T17:26:00+00:00 2026-05-30T17:26:00+00:00

I am trying to use ICU libraries to test if a string has invalid

  • 0

I am trying to use ICU libraries to test if a string has invalid UTF-8 characters. I created a UTF-8 converter but no invalid data gives me an error on conversion. Appreciate your help.

Thanks,
Prashanth

int main()                                                                                        
{                                     
    string str ("AP1120 CorNet-IP v5.0 v5.0.1.22 òÀ MIB 1.5.3.50 Profile EN-C5000");
    //  string str ("example string here");
    //  string str (" ����������"     );                  
    UErrorCode status = U_ZERO_ERROR;                   
    UConverter *cnv;            
    const char *sourceLimit;    
    const char * source = str.c_str();                  
    cnv = ucnv_open("utf-8", &status);                                                              
    assert(U_SUCCESS(status));                                                                      

    UChar *target;                                                                                  
    int sourceLength = str.length();                                                                
    int targetLimit = 2 * sourceLength;                                                             
    target = new UChar[targetLimit];                                                                

    ucnv_toUChars(cnv, target, targetLimit, source, sourceLength, &status);
    cout << u_errorName(status) << endl;
    assert(U_SUCCESS(status));                          
}       
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T17:26:01+00:00Added an answer on May 30, 2026 at 5:26 pm

    I modified your program to print out the actual strings, before and after:

    #include <unicode/ucnv.h>
    #include <string>
    #include <iostream>
    #include <cassert>
    #include <cstdio>
    
    int main()
    {
        std::string str("22 òÀ MIB 1");
        UErrorCode status = U_ZERO_ERROR;
        UConverter * const cnv = ucnv_open("utf-8", &status);
        assert(U_SUCCESS(status));
    
        int targetLimit = 2 * str.size();
        UChar *target = new UChar[targetLimit];
    
        ucnv_toUChars(cnv, target, targetLimit, str.c_str(), -1, &status);
    
        for (unsigned int i = 0; i != targetLimit && target[i] != 0; ++i)
            std::printf("0x%04X ", target[i]);
        std::cout << std::endl;
        for (char c : str)
            std::printf("0x%02X ", static_cast<unsigned char>(c));
        std::cout << std::endl << "Status: " << status << std::endl;
    }
    

    Now, with default compiler settings, I get:

    0x0032 0x0032 0x0020 0x00F2 0x00C0 0x0020 0x004D 0x0049 0x0042 0x0020 0x0031
    0x32 0x32 0x20 0xC3 0xB2 0xC3 0x80 0x20 0x4D 0x49 0x42 0x20 0x31
    

    That is, the input is already UTF-8. This is a conspiracy of my editor that saved the file in UTF-8 (verifiable in a hex editor), and of GCC that sets is execution character set to UTF-8.

    You can coerce GCC to change those parameters. For example, forcing the execution character set to ISO-8859-1 (via -fexec-charset=iso-8859-1) produces this:

    0x0032 0x0032 0x0020 0xFFFD 0xFFFD 0x0020 0x004D 0x0049 0x0042 0x0020 0x0031
    0x32 0x32 0x20 0xF2 0xC0 0x20 0x4D 0x49 0x42 0x20 0x31
    

    As you can see, the input is now ISO-8859-1-encoded, and the conversion prompty fails and produces “invalid character” code points U+FFFD.

    However, the conversion operation still returns a “success” state. It appears that the library doesn’t consider a user data conversion error an error of the function call. Rather, the error status seems to be reserved for things like running out of space.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Trying to use a guid as a resource id in a rest url but
trying to use the preferred on method call but the code is not working
I'm trying to understang how to use icu::BreakIterator to find specific words. For example
I am trying use filehelpers class builder but I am kinda confused on what
I'm trying use the Str[fixnum] to return a specific portion of a string. #
I am trying use std::copy to copy from two different iterator. But during course
I'm trying use the Sum method in a lambda expression for a comparison, but
I'm trying use preg_match in an IF statement and return false if a string
I am trying use Thread but i have some problem (I am beginner at
I'm trying use double type in openCL, but doesn't work anyway, i want use

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.