I am using the regexec() function in C. I basically am trying to write

Question

0

Asked: June 3, 20262026-06-03T05:04:55+00:00 2026-06-03T05:04:55+00:00

I am using the regexec() function in C. I basically am trying to write

0

I am using the regexec() function in C. I basically am trying to write a regular expression to capture portions of a string for substitution.

So for example, if I have the string “Hello $X” Then I want the regexec to give me the range 6,7 as that is “$X”. But as there can be an arbitrary number of substitutions, I am using the regular expression:

"([^$]*(\\$[A-Za-z][A-Za-z0-9_]*))+"

This should match any arbitrary sequence of text + substitution patterns.

So for example in the string “First=$X, Second=$Y” I need to know that $X occurred at offset 6-7 and and $Y occurred at offset 17-18.

The actual offsets I get from regexec are:
0,19 8,19 17,19

First, I understand that the ending offset is actually one past the the character of the match. So the above offsets correspond to the following parts of the string:

First=$X, Second=$Y
, Second=$Y
$Y

Now I can see what is happening here: the first range is obviously the entire match, and the second is the first entire sub-match of the second sub-expression. But from this point on I am puzzled. Why is it only returning the first sub-match of the second sub-expression and not the first?

I suspect it has something to do with the fact that I have a repeating expression, but I’m not sure what I need to do to fix the problem. How do I get it to return the desired offsets?

Note: I am passing a 128-element regmatch_t to regexec() (nmatch=128), so I should be able to get all matches.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T05:04:56+00:00

You’re confused about what first and second mean. In this expression:

"([^$]*(\\$[A-Za-z][A-Za-z0-9_]*))+"
 ^_______________________________^    this part

is the first parenthesizes subexpression and

"([^$]*(\\$[A-Za-z][A-Za-z0-9_]*))+"
       ^________________________^    this part

is the second. If a parenthesized subexpression gets used more than once as part of a *, ?, +, or {} repetition operator, it’s the last match that counts.

If you want to match an arbitrary number of instances, than rather than using the + on the end of your regex, you simply need to call regexec multiple times, and use the ending offset of the previous run as your new starting point.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using the regexec() function in C. I basically am trying to write

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply