Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7816189
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T05:46:34+00:00 2026-06-02T05:46:34+00:00

EDIT Apologies if the original unedited question is misleading. This question is not asking

  • 0

EDIT

Apologies if the original unedited question is misleading.

This question is not asking how to remove Invalid XML Chars from a string, answers to that question would be better directed here.

I’m not asking you to review my code.

What I’m looking for in answers is, a function with the signature

string <YourName>(string input, Func<char, bool> check);

that will have performance similar or better than RemoveCharsBufferCopyBlackList. Ideally this function would be more generic and if possible simpler to read, but these requirements are secondary.


I recently wrote a function to strip invalid XML chars from a string. In my application the strings can be modestly long and the invalid chars occur rarely. This excerise got me thinking. What ways can this be done in safe managed c# and, which would offer the best performance for my scenario.

Here is my test program, I’ve subtituted the “valid XML predicate” for one the omits the char 'X'.

class Program
{
    static void Main()
    {
        var attempts = new List<Func<string, Func<char, bool>, string>>
            {
                RemoveCharsLinqWhiteList,
                RemoveCharsFindAllWhiteList,
                RemoveCharsBufferCopyBlackList
            }

        const string GoodString = "1234567890abcdefgabcedefg";
        const string BadString = "1234567890abcdefgXabcedefg";
        const int Iterations = 100000;
        var timer = new StopWatch();

        var testSet = new List<string>(Iterations);
        for (var i = 0; i < Iterations; i++)
        {
            if (i % 1000 == 0)
            {
                testSet.Add(BadString);
            }
            else
            {
                testSet.Add(GoodString);
            }
        }

        foreach (var attempt in attempts)
        {
            //Check function works and JIT
            if (attempt.Invoke(BadString, IsNotUpperX) != GoodString)
            {
                throw new ApplicationException("Broken Function");       
            }

            if (attempt.Invoke(GoodString, IsNotUpperX) != GoodString)
            {
                throw new ApplicationException("Broken Function");       
            }

            timer.Reset();
            timer.Start();
            foreach (var t in testSet)
            {
                attempt.Invoke(t, IsNotUpperX);
            }

            timer.Stop();
            Console.WriteLine(
                "{0} iterations of function \"{1}\" performed in {2}ms",
                Iterations,
                attempt.Method,
                timer.ElapsedMilliseconds);
            Console.WriteLine();
        }

        Console.Readkey();
    }

    private static bool IsNotUpperX(char value)
    {
        return value != 'X';
    }

    private static string RemoveCharsLinqWhiteList(string input,
                                                      Func<char, bool> check);
    {
        return new string(input.Where(check).ToArray());
    }

    private static string RemoveCharsFindAllWhiteList(string input,
                                                      Func<char, bool> check);
    {
        return new string(Array.FindAll(input.ToCharArray(), check.Invoke));
    }

    private static string RemoveCharsBufferCopyBlackList(string input,
                                                      Func<char, bool> check);
    {
        char[] inputArray = null;
        char[] outputBuffer = null;

        var blackCount = 0;
        var lastb = -1;
        var whitePos = 0;

        for (var b = 0; b , input.Length; b++)
        {
            if (!check.invoke(input[b]))
            {
                var whites = b - lastb - 1;
                if (whites > 0)
                {
                    if (outputBuffer == null)
                    {
                        outputBuffer = new char[input.Length - blackCount];
                    }

                    if (inputArray == null)
                    {
                        inputArray = input.ToCharArray();
                    }

                    Buffer.BlockCopy(
                                      inputArray,
                                      (lastb + 1) * 2,
                                      outputBuffer,
                                      whitePos * 2,
                                      whites * 2);
                    whitePos += whites; 
                }

                lastb = b;
                blackCount++;
            }
        }

        if (blackCount == 0)
        {
            return input;
        }

        var remaining = inputArray.Length - 1 - lastb;
        if (remaining > 0)
        {
            Buffer.BlockCopy(
                              inputArray,
                              (lastb + 1) * 2,
                              outputBuffer,
                              whitePos * 2,
                              remaining * 2);

        }

        return new string(outputBuffer, 0, inputArray.Length - blackCount);
    }        
}

If you run the attempts you’ll note that the performance improves as the functions get more specialised. Is there a faster and more generic way to perform this operation? Or if there is no generic option is there a way that is just faster?

Please note that I am not actually interested in removing ‘X’ and in practice the predicate is more complicated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T05:46:35+00:00Added an answer on June 2, 2026 at 5:46 am

    You certainly don’t want to use LINQ to Objects aka enumerators to do this if you require high performance. Also, don’t invoke a delegate per char. Delegate invocations are costly compared to the actual operation you are doing.

    RemoveCharsBufferCopyBlackList looks good (except for the delegate call per character).

    I recommend that you inline the contents of the delegate hard-coded. Play around with different ways to write the condition. You may get better performance by first checking the current char against a range of known good chars (e.g. 0x20-0xFF) and if it matches let it through. This test will pass almost always so you can save the expensive checks against individual characters which are invalid in XML.

    Edit: I just remembered I solved this problem a while ago:

        static readonly string invalidXmlChars =
            Enumerable.Range(0, 0x20)
            .Where(i => !(i == '\u000A' || i == '\u000D' || i == '\u0009'))
            .Select(i => (char)i)
            .ConcatToString()
            + "\uFFFE\uFFFF";
        public static string RemoveInvalidXmlChars(string str)
        {
            return RemoveInvalidXmlChars(str, false);
        }
        internal static string RemoveInvalidXmlChars(string str, bool forceRemoveSurrogates)
        {
            if (str == null) throw new ArgumentNullException("str");
            if (!ContainsInvalidXmlChars(str, forceRemoveSurrogates))
                return str;
    
            str = str.RemoveCharset(invalidXmlChars);
            if (forceRemoveSurrogates)
            {
                for (int i = 0; i < str.Length; i++)
                {
                    if (IsSurrogate(str[i]))
                    {
                        str = str.Where(c => !IsSurrogate(c)).ConcatToString();
                        break;
                    }
                }
            }
    
            return str;
        }
        static bool IsSurrogate(char c)
        {
            return c >= 0xD800 && c < 0xE000;
        }
        internal static bool ContainsInvalidXmlChars(string str)
        {
            return ContainsInvalidXmlChars(str, false);
        }
        public static bool ContainsInvalidXmlChars(string str, bool forceRemoveSurrogates)
        {
            if (str == null) throw new ArgumentNullException("str");
            for (int i = 0; i < str.Length; i++)
            {
                if (str[i] < 0x20 && !(str[i] == '\u000A' || str[i] == '\u000D' || str[i] == '\u0009'))
                    return true;
                if (str[i] >= 0xD800)
                {
                    if (forceRemoveSurrogates && str[i] < 0xE000)
                        return true;
                    if ((str[i] == '\uFFFE' || str[i] == '\uFFFF'))
                        return true;
                }
            }
            return false;
        }
    

    Notice, that RemoveInvalidXmlChars first invokes ContainsInvalidXmlChars to save the string allocation. Most strings do not contain invalid XML chars so we can be optimistic.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

[Edit: My apologies ... the original question wording was ambiguous and I was not
[Edit #3] - to anyone reading this question: do not under any circumstance use
EDIT: There's now a doc page on this so this question is irrelevant, also
Frstly my apologies if this is a duplicate question. I have tried to find
EDIT: damien the unbeliever, my apologies, trying to be terse I omitted saying that
EDIT: This post was originally specific to ASP.NET, but after thinking about it I'm
edit alright, I guess C is painful in nature--Just, this part, is particularly painful.
related (sort of) to this question. I have written a script that will loop
I hope my title is not misleading, but what I'm looking for is a
EDIT: THE PROBLEMS WERE CAUSED BY AN IMPROPERLY LOADING JQUERY LIBRARY, AN ISSUE NOT

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.