I am writing a simple parser for C. I was just running it with

Question

0

Asked: May 15, 20262026-05-15T02:58:06+00:00 2026-05-15T02:58:06+00:00

I am writing a simple parser for C. I was just running it with

0

I am writing a simple parser for C. I was just running it with some other language files (for fun – to see the extent of C-likeness and laziness – don’t wanna really write separate parsers for each language if I can avoid it).

However the parser seems to break down for JavaScript if the code being parsed contains regular expressions…

Case 1:
For example, while parsing the JavaScript code snippet,

var phone="(304)434-5454"
phone=phone.replace(/[\(\)-]/g, "") 
//Returns "3044345454" (removes "(", ")", and "-")

The ‘(‘, ‘[‘ etc get matched as starters of new scopes, which may never be closed.

Case 2:
And, for the Perl code snippet,

 # Replace backslashes with two forward slashes
 # Any character can be used to delimit the regex
 $FILE_PATH =~ s@\\@//@g;

The // gets matched as a comment…

How can I detect a regular expression within the content text of a “C-like” program-file?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T02:58:07+00:00

It is impossible.

Take this, for example:

m =~ s/a/b/g;

Could be both C or perl.

One minute’s thinking reveals, that the number of perl style regular expressions that are also sntyctically valid C expressions is infinite.

Another example:

m+foo *bar[index]+i

The best you can get is some extreme vague guesswork. The difficulty stems from the fact that a regular expression is a sequence of characters that can be virtually everything.

You better clean up your error handling. A parser should not “break down” if some parenthesis are missing or superfluous ones are seen.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am writing a simple parser for C. I was just running it with

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply