Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8145045
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T13:32:30+00:00 2026-06-06T13:32:30+00:00

I’m using NUnit v2.5 to compare strings that contain composite Unicode characters. Although comparison

  • 0

I’m using NUnit v2.5 to compare strings that contain composite Unicode characters.
Although comparison itself works fine, a caret indicating first difference seems to be misplaced.

UPD: I’ve ended up with overridden EqualConstraint that in turn invokes a custom TextMessageWriter, so I no longer need an answer. See for solution below.

Here’s the snippet:

string s1 = "ใช้งานง่าย";
string s2 = "ใช้งานงาย";
Assert.That(s1, Is.EqualTo(s2));

Here’s the output:

Expected: "ใช้งานงาย"
But was:  "ใช้งานง่าย"
------------------^

The arrow indicating first different character seems to be off 2 positions (as many as there are tone marks above). For longer strings, it becomes a real pain.
I have attempted String.Normalize() but it wouldn’t work either.

How can I overcome this problem? Thanks for your help. See my answer below.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T13:32:33+00:00Added an answer on June 6, 2026 at 1:32 pm

    I think I cannot find any better answer, so answering my own question.

    Cause.
    There are many languages using non-spacing modifiers for characters. For European languages, there are substitutions, e.g. "u" (U+0075) + "¨" (U+00A8) = "ü" (U+00FC). In this case, solution by @tchrist is quite sufficient.

    However, for complex writing systems, there is no substitution for non-spacing modifiers. Therefore, NUnit’s TextMessageWriter.WriteCaretLine(int mismatch) treats mismatch parameter as a byte offset, while screen representation of Thai string may be shorter than the length of caret line ("-----^").

    Solution.
    Force WriteCaretLine(int mismatch) to respect non-spacing modifiers, reducing mismatch value to the number of non-spacing modifiers occurred before this offset.
    Implement all supplementary classes that are actually needed only to make your new code invoked.

    Along with Thai, I have tested it with Devanagari and Tibetan. It works as expected.

    Yet another pitfall. If you’re using NUnit with Visual Studio through ReSharper like I do, you have to configure your Internet Explorer’s fonts (it cannot be managed with R#) so that it used proper monospaced fonts for Thai, Devanagari, etc.

    Implementation.

    1. Inherit TextMessageWriter and override its DisplayStringDifferences;
    2. Implement your own ClipExpectedAndActual and FindMismatchPosition – here are non-spacing modifiers are respected; Proper clipping is needed since it may also impact calculation of non-spacing elements.
    3. Inherit EqualConstraint and override its WriteMessageTo(MessageWriter writer) so that your MessageWriter was used;
    4. Optionally, create a custom wrapper for simple invocation of custom constraint.

    The source code goes below. About 80% of the code doesn’t do anything useful, but it’s included due to access levels in original code.

    // Step 1.
    public class ThaiMessageWriter : TextMessageWriter
    {
        /// <summary>
        /// This method is merely a copy of the original method taken from NUnit sources,
        /// except that it changes meaning of <paramref name="mismatch"/> before the caret line is displayed.
        /// <remarks>
        /// Originally passed <paramref name="mismatch"/> contains byte offset, while proper display of caret requires
        /// it position to be calculated in character placeholder units. They are different in case of
        /// over- or under-string Unicode characters like acute mark or complex script (Thai)
        /// </remarks> 
        /// </summary>
        /// <param name="clipping"></param>
        public override void DisplayStringDifferences(string expected, string actual, int mismatch, bool ignoreCase, bool clipping)
        {
            // Maximum string we can display without truncating
            int maxDisplayLength = MaxLineLength
                                   - PrefixLength   // Allow for prefix
                                   - 2;             // 2 quotation marks
    
            int mismatchOffset = mismatch;
    
            if (clipping)
                MsgUtils2.ClipExpectedAndActual(ref expected, ref actual, maxDisplayLength, mismatchOffset);
    
            expected = MsgUtils.EscapeControlChars(expected);
            actual = MsgUtils.EscapeControlChars(actual);
    
            // The mismatch position may have changed due to clipping or white space conversion
            int mismatchInCharPlaceholders = MsgUtils2.FindMismatchPosition(expected, actual, 0, ignoreCase);
    
            Write(Pfx_Expected);
            WriteExpectedValue(expected);
            if (ignoreCase)
                WriteModifier("ignoring case");
            WriteLine();
            WriteActualLine(actual);
            //DisplayDifferences(expected, actual);
            if (mismatch >= 0)
                WriteCaretLine(mismatchInCharPlaceholders);
    
        }
    
        // Copied due to private
        /// <summary>
        /// Write the generic 'Actual' line for a constraint
        /// </summary>
        /// <param name="constraint">The constraint for which the actual value is to be written</param>
        private void WriteActualLine(Constraint constraint)
        {
            Write(Pfx_Actual);
            constraint.WriteActualValueTo(this);
            WriteLine();
        }
    
        // Copied due to private
        /// <summary>
        /// Write the generic 'Actual' line for a given value
        /// </summary>
        /// <param name="actual">The actual value causing a failure</param>
        private void WriteActualLine(object actual)
        {
            Write(Pfx_Actual);
            WriteActualValue(actual);
            WriteLine();
        }
    
        // Copied due to private
        private void WriteCaretLine(int mismatch)
        {
            // We subtract 2 for the initial 2 blanks and add back 1 for the initial quote
            WriteLine("  {0}^", new string('-', PrefixLength + mismatch - 2 + 1));
        }
    }
    
    // Step 2.
    public static class MsgUtils2
    {
        private static readonly string ELLIPSIS = "...";
    
        /// <summary>
        ///  Almost a copy of MsgUtil.ClipExpectedAndActual method
        /// </summary>
        /// <param name="expected"></param>
        /// <param name="actual"></param>
        /// <param name="maxDisplayLength"></param>
        /// <param name="mismatch"></param>
        public static void ClipExpectedAndActual(ref string expected, ref string actual, int maxDisplayLength, int mismatch)
        {
            // Case 1: Both strings fit on line
            int maxStringLength = Math.Max(expected.Length, actual.Length);
            if (maxStringLength <= maxDisplayLength)
                return;
    
            // Case 2: Assume that the tail of each string fits on line
            int clipLength = maxDisplayLength - ELLIPSIS.Length;
            int clipStart = maxStringLength - clipLength;
    
            // Case 3: If it doesn't, center the mismatch position
            if (clipStart > mismatch)
                clipStart = Math.Max(0, mismatch - clipLength / 2);
    
            // shift both clipStart and maxDisplayLength if they split non-placeholding symbol
            AdjustForNonPlaceholdingCharacter(expected, ref clipStart);
            AdjustForNonPlaceholdingCharacter(expected, ref maxDisplayLength);
    
            expected = MsgUtils.ClipString(expected, maxDisplayLength, clipStart);
            actual = MsgUtils.ClipString(actual, maxDisplayLength, clipStart);
        }
    
        private static void AdjustForNonPlaceholdingCharacter(string expected, ref int index)
        {
    
            while (index > 0 && CharUnicodeInfo.GetUnicodeCategory(expected[index]) == UnicodeCategory.NonSpacingMark)
            {
                index--;
            }
        }
    
        static public int FindMismatchPosition(string expected, string actual, int istart, bool ignoreCase)
        {
            int length = Math.Min(expected.Length, actual.Length);
    
            string s1 = ignoreCase ? expected.ToLower() : expected;
            string s2 = ignoreCase ? actual.ToLower() : actual;
    
            int iSpacingCharacters = 0;
            for (int i = 0; i < istart; i++)
            {
                if (CharUnicodeInfo.GetUnicodeCategory(s1[i]) != UnicodeCategory.NonSpacingMark)
                    iSpacingCharacters++;
            }
            for (int i = istart; i < length; i++)
            {
                if (s1[i] != s2[i])
                    return iSpacingCharacters;
                if (CharUnicodeInfo.GetUnicodeCategory(s1[i]) != UnicodeCategory.NonSpacingMark)
                    iSpacingCharacters++;
            }
    
            //
            // Strings have same content up to the length of the shorter string.
            // Mismatch occurs because string lengths are different, so show
            // that they start differing where the shortest string ends
            //
            if (expected.Length != actual.Length)
                return length;
    
            //
            // Same strings : We shouldn't get here
            //
            return -1;
        }
    }
    
    // Step 3.
    public class ThaiEqualConstraint : EqualConstraint
    {
        private readonly string _expected;
    
        // WTF expected is private?
        public ThaiEqualConstraint(string expected) : base(expected)
        {
            _expected = expected;
        }
    
        public override void WriteMessageTo(MessageWriter writer)
        {
            // redirect output to customized MessageWriter
            var myMessageWriter = new ThaiMessageWriter();
            base.WriteMessageTo(myMessageWriter);
            writer.Write(myMessageWriter);
        }
    }
    
    // Step 4.
    public static class ThaiText
    {
        public static EqualConstraint IsEqual(string expected)
        {
            return new ThaiEqualConstraint(expected);
        }
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need a function that will clean a strings' special characters. I do NOT
That's pretty much it. I'm using Nokogiri to scrape a web page what has
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I'm new to using the Perl treebuilder module for HTML parsing and can't figure
I want to count how many characters a certain string has in PHP, but
I am reading a book about Javascript and jQuery and using one of the
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I've got a string that has curly quotes in it. I'd like to replace
I'm using v2.0 of ClassTextile.php, with the following call: $testimonial_text = $textile->TextileRestricted($_POST['testimonial']); ... and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.