Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8538589
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T11:09:11+00:00 2026-06-11T11:09:11+00:00

I have read How do I get the name of captured groups in a

  • 0

I have read How do I get the name of captured groups in a C# Regex? and How do I access named capturing groups in a .NET Regex? to try to understand how to find the result of a matched group in regular expressions.

I’ve also read everything in the MSDN at http://msdn.microsoft.com/en-us/library/30wbz966.aspx

It seems strange to me is how C# (or .NET) seems to be the only implementation of regular expressions that makes you iterate groups to find which group matched (especially if you need the name), and also the fact that the name isn’t stored with the group result. PHP and Python for example will give you the group name that matched as part of the RegEx match result.

I have to iterate the groups and check for a match, and I have to keep a list of my own group names cause the names aren’t in the result.

Here is my code to demonstrate:

public class Tokenizer
{
    private Dictionary<string, string> tokens;

    private Regex re;

    public Tokenizer()
    {
        tokens = new Dictionary<string, string>();
        tokens["NUMBER"] = @"\d+(\.\d*)?";  // Integer or decimal number
        tokens["STRING"] = @""".*""";       // String
        tokens["COMMENT"] = @";.*";         // Comment
        tokens["COMMAND"] = @"[A-Za-z]+";   // Identifiers
        tokens["NEWLINE"] = @"\n";          // Line endings
        tokens["SKIP"] = @"[ \t]";          // Skip over spaces and tabs

        List<string> token_regex = new List<string>();
        foreach (KeyValuePair<string, string> pair in tokens)
        {
            token_regex.Add(String.Format("(?<{0}>{1})", pair.Key, pair.Value));
        }
        string tok_regex = String.Join("|", token_regex);

        re = new Regex(tok_regex);
    }

    public List<Token> parse(string pSource)
    {
        List<Token> tokens = new List<Token>();

        Match get_token = re.Match(pSource);
        while (get_token.Success)
        {
            foreach (string gname in this.tokens.Keys)
            {
                Group group = get_token.Groups[gname];
                if (group.Success)
                {
                    tokens.Add(new Token(gname, get_token.Groups[gname].Value));
                    break;
                }
            }

            get_token = get_token.NextMatch();
        }
        return tokens;
    }
}

In the line

foreach (string gname in this.tokens.Keys)

That should not be necessary but it is.

Is there anyway to find the matching group and it’s name without having to iterate all the groups?

EDIT: To compare implementations. Here is the same code that I wrote for a Python implementation.

class xTokenizer(object):
    """
    xTokenizer converts a text source code file into a collection of xToken objects.
    """

    TOKENS = [
        ('NUMBER',  r'\d+(\.\d*)?'),    # Integer or decimal number
        ('STRING',  r'".*"'),           # String
        ('COMMENT', r';.*'),            # Comment
        ('VAR',     r':[A-Za-z]+'),     # Variables
        ('COMMAND', r'[A-Za-z]+'),      # Identifiers
        ('OP',      r'[+*\/\-]'),       # Arithmetic operators
        ('NEWLINE', r'\n'),             # Line endings
        ('SKIP',    r'[ \t]'),          # Skip over spaces and tabs
        ('SLIST',   r'\['),             # Start a list of commands
        ('ELIST',   r'\]'),             # End a list of commands
        ('SARRAY',  r'\{'),             # Start an array
        ('EARRAY',  r'\}'),             # End end an array
    ]

    def __init__(self,tokens=None):
        """
        Constructor
            Args:
                tokens - key/pair of regular expressions used to match tokens.
        """
        if tokens is None:
            tokens = self.TOKENS
        self.tokens = tokens
        self.tok_regex = '|'.join('(?P<%s>%s)' % pair for pair in tokens)
        pass

    def parse(self,source):
        """
        Converts the source code into a list of xToken objects.
            Args:
                sources - The source code as a string.
            Returns:
                list of xToken objects.
        """
        get_token = re.compile(self.tok_regex).match
        line = 1
        pos = line_start = 0
        mo = get_token(source)
        result = []
        while mo is not None:
            typ = mo.lastgroup
            if typ == 'NEWLINE':
                line_start = pos
                line += 1
            elif typ != 'SKIP':
                val = mo.group(typ)
                result.append(xToken(typ, val, line, mo.start()-line_start))
            pos = mo.end()
            mo = get_token(source, pos)
        if pos != len(source):
            raise xParserError('Unexpected character %r on line %d' %(source[pos], line))
        return result

As you can see Python doesn’t require you to iterate the groups, and a similar thing can be done in PHP and I assume Java.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T11:09:12+00:00Added an answer on June 11, 2026 at 11:09 am

    There’s no need to maintain a separate list of named groups. Use the Regex.GetGroupNames method instead.

    Your code would then look similar to this:

    foreach (string gname in re.GetGroupNames())
    {
        Group group = get_token.Groups[gname];
        if (group.Success)
        {
            // your code
        }
    }
    

    That said, be aware of this note on the MSDN page:

    Even if capturing groups are not explicitly named, they are
    automatically assigned numerical names (1, 2, 3, and so on).

    With that in mind, you should either name all your groups, or filter out numeric group names. You could do so with some LINQ, or with an additional check that !Char.IsNumber(gname[0]) to check the first character of the group name, making the assumption that any such group is invalid. Alternately, you could also use the int.TryParse method.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have code to read contacts, get name and number and put them in
I have read all the docs I can get my hands on and google'd
I have read other related question but i cant really get them to relate
I have read that CURL is way too fast than File Get Contents and
I want to get started at game development. I have read a lot of
I have created Controller Attribute and would like to read SessionId but get error
I have successfully managed to get System.Speech.Synthesis to read English text in arbitrary voices
I just don't get it. I have a function to read stemmed strings from
I have read a lot of step by step tutorials and still couldn't get
I am trying to get FB user's full name using the access_token. I have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.