I’m using Jison (Bison) to create a simple markup language. I’m clearly new to this, but slight variations are working very well. I just don’t understand the source of the S/R conflict.
It doesn’t seem matter that ‘Text’ is returned by two lexer actions (with different Start Conditions) and I like this because it seems to allow the grammar to have fewer rules and because the error messages to the user are consistent. I’ve tried making the ‘Text’ rule common regardless of context and I’ve also tried giving each token a different name, but it doesn’t seem to have any effect on the S/R Conflicts when it’s all together.
The parser is SUPPOSED to create a json-object with plain-text, sub-arrays, and various special nodes.
Specification:
/* lexical grammar */
%lex
%s bracketed
%%
<bracketed>(\\.|[^\\\,\[\]])+ { yytext = yytext.replace(/\\(.)/g, '$1'); return 'Text'; }
<INITIAL>(\\.|[^\\\[])+ { yytext = yytext.replace(/\\(.)/g, '$1'); return 'Text'; }
"[" { this.begin('bracketed'); return '['; }
"]" { this.popState(); return ']'; }
"," return ','
<<EOF>> return 'END'
/lex
%start template
%%
template
: sentence END
;
sentence
: /* empty */
| sentence Text
| sentence '[' ']'
| sentence '[' dynamic ']'
;
dynamic
: sentence
/*| dynamic ',' sentence*/
;
Warnings:
Conflict in grammar: multiple actions possible when lookahead token is ] in state 5
- reduce by rule: sentence ->
- shift token (then go to state 6)
States with conflicts:
State 5
sentence -> sentence [ .] #lookaheads= END Text [ ]
sentence -> sentence [ .dynamic ] #lookaheads= END Text [ ]
dynamic -> .sentence #lookaheads= ]
sentence -> . #lookaheads= ] Text [
sentence -> .sentence Text
sentence -> .sentence [ ]
sentence -> .sentence [ dynamic ]
Different generator algorithms have more or less trouble, but they all seem to have trouble.
Thanks!
The conflict comes fundamentally from these two rules:
The reason is that after seeing a
sentenceand a[and looking at the next token beingText, the parser doesn’t known whether to shift theText, matching the first rule, or to treat thatTextas the beginning of asentenceListgoing towards matching the second rule.Now if you had a parser generator that use 2-token lookahead, this wouldn’t be a problem, but bison is LALR(1) (the 1 being one token lookahead).
There are a couple of things you could try:
do extra lookahead in the lexer to differentiate Text-followed-by-] from Text-not-followed-by-] as two distinct tokens then rewrite the rules to use both of these tokens.
Use bison’s %glr-parser feature to use GLR parser. This will parse the sentence both ways and later throw away the one that doesn’t match
refactor the grammar to not need 2-token lookahead.
One refactoring that works in your case would be to rewrite the
sentencerules to make them all right-recursive instead of left-recursive:This avoids having
sentence(or any other rule that starts withsentencesuch assentenceList) start with a null reduction of thesentence: /*empty*/rule. So the parser can freely shift aTextin the problematic case deferring the reduction until it sees the next token. It does have memory use implications, however, as it results in a parser that will essentially shift the entire input on to the parser stack and then reduce it one sentence at a time.Another refactor you could do would be to subsume the
[Text]and[]constructs into the[sentenceList]:So now a
sentenceListis one or more sentences separated by commas (instead of two or more), and in the action for thesentence '[' sentenceList ']'rule, you’d check thesentenceListto see if it was two or more sentences and act appropriately.