Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3343036
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T00:54:31+00:00 2026-05-18T00:54:31+00:00

Note: This is .NET regular expressions. I have a bunch of text, from which

  • 0

Note: This is .NET regular expressions.

I have a bunch of text, from which I need to extract specific lines. The lines I care about have the following forms:

type Name(type arg1, type arg2, type arg3)

To match this, I came up with the following regular expression:

^(\w+)\s+(\w+)\s*\(\s*((\w+)\s+(\w+)(:?,\s+)?)*\s*\)$

This confusing mess produces a Match object that looks like this:

Group 0: type Name(type arg1, type arg2, type arg3)
    Capture 0: type Name(type arg1, type arg2, type arg3)
Group 1: type
    Capture 0: type
Group 2: Name
    Capture 0: Name
Group 3: type arg3
    Capture 0: type arg1,
    Capture 1: type arg2,
    Capture 2, type arg3
Group 4: type
    Capture 0: type
    Capture 1: type
    Capture 2: type
Group 5: arg3
    Capture 0: arg1
    Capture 1: arg2
    Capture 2: arg3
Group 6:
    Capture 0: ,
    Capture 1: ,

However, this is not the full input. Some of these lines might look like this:

type Name(type arg1, type[] arg2, type arg3)

Note the brackets before arg2.

So, I modified my regular expression:

^(\w+)\s+(\w+)\s*\(\s*((\w+)\s*(\[\])?\s+(\w+)(:?,\s+)?)*\s*\)$

This produces a Match like this:

Group 0: type Name(type arg1, type arg2, type arg3)
    Capture 0: type Name(type arg1, type arg2, type arg3)
Group 1: type
    Capture 0: type
Group 2: Name
    Capture 0: Name
Group 3: type arg3
    Capture 0: type arg1,
    Capture 1: type arg2,
    Capture 2, type arg3
Group 4: type
    Capture 0: type
    Capture 1: type
    Capture 2: type
Group 5: []
    Capture0: []
Group 6: arg3
    Capture 0: arg1
    Capture 1: arg2
    Capture 2: arg3
Group 7:
    Capture 0: ,
    Capture 1: ,

Group 5 does, in fact, contain the brackets. However, its only capture was #0, which is not the capture it was in (the second one).

Is there some way to correlate this capture to the appropriate group, or am I barking up the wrong tree?

An alternate way to implement this, I guess, would be to parse the arguments in the input separately. But, surely there’s be a way to do it this way, isn’t there?

EDIT:
To clarify, I’m not building a language parser. I’m converting old textual api documentation for a scripting language which looks like this:

--- foo object ---
void bar(int baz)
 * This does something.
 * Remember blah blah blah.

int getFrob()
 * Gets the frob

Into a new format that I can export to HTML, etc.

Edit mkII:
For others benefit, here’s the new revised code:

m = Regex.Match(line, @"^(\w+)\s+(\w+)\s*\((.*?)\)$");
if (m.Success) {

    if (curMember != null) {
        curType.Add(curMember);
    }
    curMember = new XElement("method");
    curMember.Add(new XAttribute("type", m.Groups[1].Value));
    curMember.Add(new XAttribute("name", m.Groups[2].Value));

    if (m.Groups[3].Success) {
        XElement args = new XElement("arguments");

        MatchCollection matches = Regex.Matches(m.Groups[3].Value, @"(\w+)(\[\])?\s+(\w+)");

        foreach (Match m2 in matches) {
            XElement arg = new XElement("arg");
            arg.Add(new XAttribute("type", m2.Groups[1].Value));
            if (m2.Groups[2].Success) {
                arg.Add(new XAttribute("array", "array"));
            }
            arg.Value = m2.Groups[3].Value;


            args.Add(arg);
        }

        curMember.Add(args);
    }
}

First, it matches the type Name(*) part, and when it gets that, it matches type Name repeatedly on the parameter part.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T00:54:31+00:00Added an answer on May 18, 2026 at 12:54 am

    How I do this is to make it a two phase parser.

    First, I make sure I know what I have. With that phase, I don’t care about the matching groups.

    The second phase actually tries to make sense of it all. From the first phase, it could e.g. be easy to get everything within the parenthesis, but parsing the arguments is hard. So, from the result within the parenthesis, you e.g. split that on the , and then parse the arguments one by one.

    If that’s too hard, because e.g. multi dimensional arrays are allowed ([,]), you create a regular expression that eats the first argument from the part from within the parameter. You then know how long that argument is, remove that part from the arguments and have three left, etc.

    1. Match the entire line and produce the part within the parenthesis:

      "type Name(type arg1, type[] arg2, type arg3)" => "type arg1, type[] arg2, type arg3"
      
    2. Parse the arguments:

      a. Eat the first argument of the list of arguments:

      "type arg1, type[] arg2, type arg3" => "type", "arg1"
      

      b. Remove the length of the parsed argument from the list of arguments:

      "type arg1, type[] arg2, type arg3" => ", type[] arg2, type arg3"
      
      
      ", type[] arg2, type arg3".TrimStart(new char[]{ ',', ' ' }) => "type[] arg2, type arg3"
      

      c. If the string is not empty: lather, rinse, repeat.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am having problems using a ASP.NET Regular Expression Validator on text boxes. This
Note: I found this Creating a Word Doc in C#.NET , but that is
(Note: This is for MySQL's SQL, not SQL Server.) I have a database column
Note This is not a REBOL-specific question. You can answer it in any language.
Note: This is the opposite direction to most similar questions! I have an iPhone
Let's say I have a simple stored procedure that looks like this (note: this
I have a regular expression which works perfectly well (although I am sure it
I have just coded the below regular expression. I have a mini rich text
It says (note this is during run-time I get this error)... I have no
I know, WinFax Pro is so 1998. (note: this is not WinFax.dll, which is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.