I’d like to use the same flex/bison scanner/parser for an interpreter and for loading a file to be interpreted. I can not get the newline parsing to work correctly in both cases.
- Interpreter: There is a prompt and I can enter commands terminated by pressing ENTER.
- File: Here is an example input file:
—–cut———
begin(
print("well done"), 1)
—-cut——-
So, there is a newline in the first line and after the ‘(‘ that should be eaten.
In my scanner.l I have
%%
[ \t] { errorLineCol += strlen(yytext); }
\n { errorLineNumber++;
errorLineCol = 0; }
("-"?[0-9])[0-9]* { errorLineCol += strlen(yytext);
yylval = stringToInteger(yytext);
return TINTEGER; }
…..
This then works for the file scenario but not for the interpreter. I the have to press and additional Ctrl+D after the ENTER. If I change to
\n { errorLineNumber++;
errorLineCol = 0;
return 0; }
Then the interpreter works but not the file reading; which then stops after the first newline it encounters. What is a good way to tackle this issue?
Edit:
Here is the top level of the parser:
input: uexpr { parseValue = $1; }
| /* empty */ { parseValue = myNull; }
| error { parseValue = myNull; }
;
uexpr: list
| atom
;
Possible Solution: seems to be to use
\n { errorLineNumber++;
errorLineCol = 0;
if (yyin == stdin) return 0; }
The main problem is that your parser function
ypparsedoes not return until it reduces the entire language to the start symbol.If the top level of your grammar is something like:
of course the machine will expect a complete script (terminated by you hitting Ctrl-D). If your interpreter is this logic:
it won’t work since
yyparseis consuming the whole script before returning.The
return 0;solves the problem for this interactive mode because the token value 0 indicatesEOFto the parser, making it think the script has ended.I do not agree with the solution of making
\na token. It will only complicate the grammar (a hitherto insignificant piece of whitespace is now significant) and ultimately not work because theyyparsefunction will still want to process the complete grammar. That is to say, if you have newline as a token, but the grammar’s start symbol represents the entire script,yyparsewill still not return to your interactive prompt loop.A quick and dirty hack is to let the lexer know whether interactive mode is in effect. Then it can conditionaly
return 0;for every instance of a newline if it is in interactive mode. If the input isn’t a complete statement, there will be a syntax error since the script as a whole ends at the newline. In normal file reading mode, your lexer can eats all whitespace without returning, as before allowing the whole file to be processed with a singleyyparse.If you want interactive input and file reading without implementing two modes of behavior in the lexer, what you can do is change the grammar so it only parses one statement of the language: the
yyparsefunction returns for every top level statement of your language. (And the lexer eats newlines like before, no returning 0). I.e the start symbol of the grammar is just one statement (possibly empty). Then your file parser must be implemented as a loop (written by you) which calls yyparse to get all the statements from the file untilyyparseencounters an empty input. The downside of this approach is that if the user types incomplete syntax (e.g. dangling open parenthesis), the parser will keep scanning the input until it is satisfied. This is unfriendly, like programs that usescanffor interactive user input (it’s the same problem:scanfis a parser that doesn’t return until it is satisified).Another possibility is to have an interactive mode which performs its own user input rather than calling yyparse to get the input and parse it. In this mode, you read the user’s input into a line buffer. Then you have the parser process the line buffer. To process a line buffer instead of a
FILE *stream is perfectly possible. You just have to write custom input handling (your own definition of theYY_INPUTmacro). This is the approach you will end up needing anyway if you implement a decent interactive mode with line editing and history recall, e.g. usinglibeditorGNU readline.