I’m struggling with trying to write a rule that will catch various comments and

Question

0

Editorial Team

Asked: June 18, 20262026-06-18T10:46:36+00:00 2026-06-18T10:46:36+00:00

I’m struggling with trying to write a rule that will catch various comments and

0

I’m struggling with trying to write a rule that will catch various comments and even "unended" comment errors.

This is for a language based on Pascal. Comments can be of the following forms:

(* ...with any characters within... *)

(*
 * separated onto multiple lines
 *)

(* they can contain "any" symbol, so -, +, :, ; , etc. should be ignored *)

but I need to catch any comment errors, like:

(* this comment has no closing r-parenthesis * or (* this comment is missing an asterisk )

I have this so far:

{%
int yylval;
vector<string> string_table;
int string_table_index = 0;
int yyline = 1, yycolumn = 1;
%}

delim   [ \t\n]
ws      {delim}+
letter  [a-zA-Z]
digit   [0-9]
id      {letter}({letter}|{digit})*
number  {digit}+
float   {digit}+(\.{digit}+)?(E[+\-]?{digit}+)?


%%
{ws}      {yycolumn += yyleng;}
"(*" {
    int c;
    yycolumn += yyleng;
    while ((c = yyinput()) != '*' && c != EOF) {
        c = yyinput(); /* read additional text */
        if (c == '*') {
            while ((c = yyinput()) == '*') {
                c = yyinput();
                if (c == ')') {
                    break; /* found the end */
                } else if (c == EOF) {
                    cout << "EOF in comment\n";
                    break;
                } else {
                    cout << "unended comment, line = "  
                    << yyline << ", column = "
                    << yycolumn-yyleng << "\n";
                }
            }
        }
    }
 }

it’s not catching the last parenthesis (always prints out RPARENtoken!),
it’s not ignoring all the characters inside the comment (ie: prints MINUStoken for "-")
it can’t catch comments on multiple lines.
I’m not sure it’s catching unended comment errors correctly.

I think I’m close… can anyone see where I went wrong?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T10:46:38+00:00

Consider using start conditions to avoid having to write all that extra code in the (* pattern. I’ve written a short example below.

%x COMMENT
%%
"(*" { BEGIN(COMMENT); }
<COMMENT>{
    "*)" { BEGIN(INITIAL); }
    <<EOF>> { printf("EOF in comment\n"); }
    . {}
}

Basically when the lexer finds the beginning of a comment, it enters the COMMENT state, and will only check the rules within the <COMMENT> block. When it finds *), it will return to the initial state. Note that if you plan on using multiple states, it’d probably be better to use yy_push_state(COMMENT) and yy_pop_state(COMMENT) instead of BEGIN(STATENAME).

I’m not entirely sure what your criteria for comment errors are (e.g., how it’s different from encountering an EOF in a comment), but this can likely be expanded to handle those cases as well.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m struggling with trying to write a rule that will catch various comments and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply