Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7048837
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T02:56:22+00:00 2026-05-28T02:56:22+00:00

I have a routine that needs to be supplied with normalized strings. However, the

  • 0

I have a routine that needs to be supplied with normalized strings. However, the data that’s coming in isn’t necessarily clean, and String.Normalize() raises ArgumentException if the string contains invalid code points.

What I’d like to do is just replace those code points with a throwaway character such as ‘?’. But to do that I need an efficient way to search through the string to find them in the first place. What is a good way to do that?

The following code works, but it’s basically using try/catch as a crude if-statement so performance is terrible. I’m just sharing it to illustrate the behavior I’m looking for:

private static string ReplaceInvalidCodePoints(string aString, string replacement)
{
    var builder = new StringBuilder(aString.Length);
    var enumerator = StringInfo.GetTextElementEnumerator(aString);

    while (enumerator.MoveNext())
    {
        string nextElement;
        try { nextElement = enumerator.GetTextElement().Normalize(); }
        catch (ArgumentException) { nextElement = replacement; }
        builder.Append(nextElement);
    }

    return builder.ToString();
}

(edit:) I’m thinking converting the text to UTF-32 so that I could quickly iterate over it and see if each dword corresponds to a valid code point. Is there a function that will do that? If not, is there a list of invalid ranges floating around out there?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T02:56:23+00:00Added an answer on May 28, 2026 at 2:56 am

    It seems like the only way to do it is ‘manually’ like you’ve done. Here’s a version that gives the same results as yours, but is a bit faster (about 4 times over a string of all chars up to char.MaxValue, less improvement up to U+10FFFF) and doesn’t require unsafe code. I’ve also simplified and commented my IsCharacter method to explain each selection:

    static string ReplaceNonCharacters(string aString, char replacement)
    {
        var sb = new StringBuilder(aString.Length);
        for (var i = 0; i < aString.Length; i++)
        {
            if (char.IsSurrogatePair(aString, i))
            {
                int c = char.ConvertToUtf32(aString, i);
                i++;
                if (IsCharacter(c))
                    sb.Append(char.ConvertFromUtf32(c));
                else
                    sb.Append(replacement);
            }
            else
            {
                char c = aString[i];
                if (IsCharacter(c))
                    sb.Append(c);
                else
                    sb.Append(replacement);
            }
        }
        return sb.ToString();
    }
    
    static bool IsCharacter(int point)
    {
        return point < 0xFDD0 || // everything below here is fine
            point > 0xFDEF &&    // exclude the 0xFFD0...0xFDEF non-characters
            (point & 0xfffE) != 0xFFFE; // exclude all other non-characters
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a sub routine that creates an array of IP addresses, however I
We have a .NET application that needs to pass some data over to a
I have this routine that calculates the seconds-to-date for a struct tm . On
I have a routine that examines thousands of records looking for discrepancies. This can
I have a routine that dynamically changes a select list's selected option when the
I have a little routine that's run under Linux and Windows written in C
I have a javascript routine that is performing actions on a group of checkboxes,
I have a vb.net 2.0 program, that has a batch/queue routine to execute tasks
I have a program that needs to run a function M times per iteration,
I have a command line process that needs to use code in one of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.