I’m having a very strange problem where my execution jumps from semi-predictable locations to

Question

0

Asked: May 23, 20262026-05-23T14:49:00+00:00 2026-05-23T14:49:00+00:00

I’m having a very strange problem where my execution jumps from semi-predictable locations to

0

I’m having a very strange problem where my execution jumps from semi-predictable locations to another location while debugging a Visual Studio .NET unit test. The method in which this strange behavior occurs is “Parse(…)”, below. I’ve indicated in this method the one location where execution will jump to (“// EXCEPTION”). I’ve also indicated several of the places where, in my testing, execution was when it strangely jumped (“// JUMP”). The jump will frequently occur from the same place several times consecutively, and then begin jumping from a new location consecutively. These places from which execution jumps are either the beginning of switch statements or the ends of code blocks, suggesting to me that there is something funky going on with the instruction pointer, but I’m not .NET savvy enough to know what that something might be. If it makes any difference, the execution does not jump to immediately before the “throw” statement, but instead to a point in execution where the exception has just been thrown. Very strange.

In my experience the execution jump only occurs while parsing the contents of a nested named group.

Background on what the code below purports to do: the solution I’m trying to implement is a simple regular expression parser. This is not a full-out regex parser. My needs are only to be able to locate particular named groups inside a regular expression and replace some of the named groups’ contents with other contents. So basically I’m just running through a regular expression and keeping track of named groups I find. I also keep track of unnamed groups, since I need to be aware of parenthesis matching, and comments, so that commented parentheses don’t upset paren-matching. A separate (and as-of-yet unimplemented) piece of code will reconstruct a string containing the regex after taking into account the replacements.

I greatly appreciate any suggestions of what might be afoot; I’m baffled!

Example Solution

Here is a Visual Studio 2010 Solution (TAR format) containing all the code I discuss below. I have the error when running this solution (with the unit test project “TestRegexParserLibTest” as the Startup Project.) Since this seems to be such a sporadic error, I’d be interested if anyone else experiences the same problem.

Code

I use some simple classes to organize the results:

// The root of the regex we are parsing
public class RegexGroupStructureRoot : ISuperRegexGroupStructure
{
    public List<RegexGroupStructure> SubStructures { get; set; }

    public RegexGroupStructureRoot()
    {
        SubStructures = new List<RegexGroupStructure>();
    }

    public override bool Equals(object obj) { ... }
}

// Either a RegexGroupStructureGroup or a RegexGroupStructureRegex
// Contained within the SubStructures of both RegexGroupStructureRoot and RegexGroupStructureGroup
public abstract class RegexGroupStructure
{
}

// A run of text containing regular expression characters (but not groups)
public class RegexGroupStructureRegex : RegexGroupStructure
{
    public string Regex { get; set; }

    public override bool Equals(object obj) { ... }
}

// A regular expression group
public class RegexGroupStructureGroup : RegexGroupStructure, ISuperRegexGroupStructure
{
    // Name == null indicates an unnamed group
    public string Name { get; set; }
    public List<RegexGroupStructure> SubStructures { get; set; }

    public RegexGroupStructureGroup()
    {
        SubStructures = new List<RegexGroupStructure>();
    }

    public override bool Equals(object obj) { ... }
}

// Items that contain SubStructures
// Either a RegexGroupStructureGroup or a RegexGroupStructureRoot
interface ISuperRegexGroupStructure
{
    List<RegexGroupStructure> SubStructures { get; }
}

Here’s the method (and associated enum/static members) where I actually parse the regular expression, returning a RegexGroupStructureRoot that contains all the named groups, unnamed groups, and other regular expression characters that were found.

using Re = System.Text.RegularExpressions

enum Mode
{
    TopLevel, // Not in any group
    BeginGroup, // Just encountered a character beginning a group: "("
    BeginGroupTypeControl, // Just encountered a character controlling group type, immediately after beginning a group: "?"
    NamedGroupName, // Reading the named group name (must have encountered a character indicating a named group type immediately following a group type control character: "<" after "?")
    NamedGroup, // Reading the contents of a named group
    UnnamedGroup, // Reading the contents of an unnamed group
}

static string _NamedGroupNameValidCharRePattern = "[A-Za-z0-9_]";
static Re.Regex _NamedGroupNameValidCharRe;

static RegexGroupStructureParser()
{
    _NamedGroupNameValidCharRe = new Re.Regex(_NamedGroupNameValidCharRePattern);
}

public static RegexGroupStructureRoot Parse(string regex)
{
    string newLine = Environment.NewLine;
    int newLineLen = newLine.Length;

    // A record of the parent structures that the parser has created
    Stack<ISuperRegexGroupStructure> parentStructures = new Stack<ISuperRegexGroupStructure>();

    // The current text we've encountered
    StringBuilder textConsumer = new StringBuilder();

    // Whether the parser is in an escape sequence
    bool escaped = false;

    // Whether the parser is in an end-of-line comment (such comments run from a hash-sign ('#') to the end of the line
    //  The other type of .NET regular expression comment is the group-comment: (?#This is a comment)
    //   We do not need to specially handle this type of comment since it is treated like an unnamed
    //   group.
    bool commented = false;

    // The current mode of the parsing process
    Mode mode = Mode.TopLevel;

    // Push a root onto the parents to accept whatever regexes/groups we encounter
    parentStructures.Push(new RegexGroupStructureRoot());

    foreach (char chr in regex.ToArray())
    {
        if (escaped) // JUMP
        {
            textConsumer.Append(chr);
            escaped = false;
        }
        else if (chr.Equals('#'))
        {
            textConsumer.Append(chr);
            commented = true;
        }
        else if (commented)
        {
            textConsumer.Append(chr);

            string txt = textConsumer.ToString();
            int txtLen = txt.Length;
            if (txtLen >= newLineLen &&
                // Does the current text end with a NewLine?
                txt.Substring(txtLen - 1 - newLineLen, newLineLen) == newLine)
            {
                // If so we're no longer in the comment
                commented = false;
            }
        }
        else
        {
            switch (mode) // JUMP
            {
                case Mode.TopLevel:
                    switch (chr)
                    {
                        case '\\':
                            textConsumer.Append(chr); // Append the backslash
                            escaped = true;
                            break;
                        case '(':
                            beginNewGroup(parentStructures, ref textConsumer, ref mode);
                            break;
                        case ')':
                            // Can't close a group if we're already at the top-level
                            throw new InvalidRegexFormatException("Too many ')'s.");
                        default:
                            textConsumer.Append(chr);
                            break;
                    }
                    break;

                case Mode.BeginGroup:
                    switch (chr)
                    {
                        case '?':
                            // If it's an unnamed group, we'll re-add the question mark.
                            // If it's a named group, named groups reconstruct question marks so no need to add it.
                            mode = Mode.BeginGroupTypeControl;
                            break;
                        default:
                            // Only a '?' can begin a named group.  So anything else begins an unnamed group.

                            parentStructures.Peek().SubStructures.Add(new RegexGroupStructureRegex()
                            {
                                Regex = textConsumer.ToString()
                            });
                            textConsumer = new StringBuilder();

                            parentStructures.Push(new RegexGroupStructureGroup()
                            {
                                Name = null, // null indicates an unnamed group
                                SubStructures = new List<RegexGroupStructure>()
                            });

                            mode = Mode.UnnamedGroup;
                            break;
                    }
                    break;

                case Mode.BeginGroupTypeControl:
                    switch (chr)
                    {
                        case '<':
                            mode = Mode.NamedGroupName;
                            break;

                        default:
                            // We previously read a question mark to get here, but the group turned out not to be a named group
                            // So add back in the question mark, since unnamed groups don't reconstruct with question marks
                            textConsumer.Append('?' + chr);
                            mode = Mode.UnnamedGroup;
                            break;
                    }
                    break;

                case Mode.NamedGroupName:
                    if (chr.Equals( '>'))
                    {
                        // '>' closes the named group name.  So extract the name
                        string namedGroupName = textConsumer.ToString();

                        if (namedGroupName == String.Empty)
                            throw new InvalidRegexFormatException("Named group names cannot be empty.");

                        // Create the new named group
                        RegexGroupStructureGroup newNamedGroup = new RegexGroupStructureGroup() {
                            Name = namedGroupName,
                            SubStructures = new List<RegexGroupStructure>()
                        };

                        // Add this group to the current parent
                        parentStructures.Peek().SubStructures.Add(newNamedGroup);
                        // ...and make it the new parent.
                        parentStructures.Push(newNamedGroup);

                        textConsumer = new StringBuilder();

                        mode = Mode.NamedGroup;
                    }
                    else if (_NamedGroupNameValidCharRe.IsMatch(chr.ToString()))
                    {
                        // Append any valid named group name char to the growing named group name
                        textConsumer.Append(chr);
                    }
                    else
                    {
                        // chr is neither a valid named group name character, nor the character that closes the named group name (">").  Error.
                        throw new InvalidRegexFormatException(String.Format("Invalid named group name character: {0}", chr)); // EXCEPTION
                    }
                    break; // JUMP

                case Mode.NamedGroup:
                case Mode.UnnamedGroup:
                    switch (chr) // JUMP
                    {
                        case '\\':
                            textConsumer.Append(chr);
                            escaped = true;
                            break;
                        case ')':
                            closeGroup(parentStructures, ref textConsumer, ref mode);
                            break;
                        case '(':
                            beginNewGroup(parentStructures, ref textConsumer, ref mode);
                            break;
                        default:
                            textConsumer.Append(chr);
                            break;
                    }
                    break;

                default:
                    throw new Exception("Exhausted Modes");
            }
        } // JUMP
    }

    ISuperRegexGroupStructure finalParent = parentStructures.Pop();
    Debug.Assert(parentStructures.Count < 1, "Left parent structures on the stack.");
    Debug.Assert(finalParent.GetType().Equals(typeof(RegexGroupStructureRoot)), "The final parent must be a RegexGroupStructureRoot");

    string finalRegex = textConsumer.ToString();
    if (!String.IsNullOrEmpty(finalRegex))
        finalParent.SubStructures.Add(new RegexGroupStructureRegex() {
            Regex = finalRegex
        });

    return finalParent as RegexGroupStructureRoot;
}

And here is a unit test that will test if the method works (note, may not be 100% correct since I don’t even get past the call to RegexGroupStructureParser.Parse.)

[TestMethod]
public void ParseTest_Short()
{
    string regex = @"
        (?<Group1>
            ,?\s+
            (?<Group1_SubGroup>
                [\d–-]+             # One or more digits, hyphen, and/or n-dash
            )            
        )
    ";

    RegexGroupStructureRoot expected = new RegexGroupStructureRoot()
    {
        SubStructures = new List<RegexGroupStructure>()
        {
            new RegexGroupStructureGroup() {
                Name = "Group1", 
                SubStructures = new List<RegexGroupStructure> {
                    new RegexGroupStructureRegex() {
                        Regex = @"
            ,?\s+
            "
                    }, 
                    new RegexGroupStructureGroup() {
                        Name = "Group1_Subgroup", 
                        SubStructures = new List<RegexGroupStructure>() {
                            new RegexGroupStructureRegex() {
                                Regex = @"
                [\d–-]+             # One or more digits, hyphen, and/or n-dash
            "
                            }
                        }
                    }, 
                    new RegexGroupStructureRegex() {
                        Regex = @"            
        "
                    }
                }
            }, 
            new RegexGroupStructureRegex() {
                Regex = @"
        "
            }, 
        }
    };

    RegexGroupStructureRoot actual = RegexGroupStructureParser.Parse(regex);

    Assert.AreEqual(expected, actual);
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T14:49:01+00:00

Your solution’s test case does cause the thrown “Invalid named group name character” exception to halt at the break; rather than the throw line. I rigged up a test file using a nested if in a case to see if the exception triggers similarly in one of my projects and it did not: the halted line was the throw statement itself.

However, when I enable editing (to use edit and continue in your project), the current line rewinds back to the throw statement. I haven’t looked at the generated IL, but I suspect that the throw (which will terminate the case without needing the “break” to follow as so:)

case 1:
   do something
   break;
case 2:
   throw ... //No break required.
case 3:

is being optimized in a way that is confusing the display, but not the actual execution or even the edit and continue feature. If the edit and continue works and the thrown exceptions are properly caught or displayed I suspect you have a display anomaly that you can ignore (although I would report it to Microsoft along with that file as it is reproducible).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m having a very strange problem where my execution jumps from semi-predictable locations to

Example Solution

Code

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply