I’ve been trying to create a parser using simpleparse. I’ve defined the grammar like this:
<w> := [ \n]* statement_list := statement,(w,statement)? statement := 'MOVE',w,word,w,'TO',w,(word,w)+ word := [A-Za-z],[A-Za-z0-9]*,([-]+,[A-Za-z0-9]+)*
Now if I try to parse a string
MOVE ABC-DEF TO ABC MOVE DDD TO XXX
The second statement gets interpreted as parameters of the first one… This sucks and is obviously not what I want. I have been able to get this working using pyparsing like this:
word = Word(alphas,alphanums+'-') statement = 'MOVE'+word+'TO'+word statement_list = OneOrMore(statement.setResultsName('statement',True))
Is there any way to get this working in simpleparse as well?
EDIT: clarification below
I am not trying to achieve a line-based grammar. What I would like to see being parsed is:
Simple case
MOVE AA TO BB
More comlex case
MOVE AA TO BB CC DD EE FF
Several of the above statments
MOVE AA TO BB CC MOVE CC TO EE MOVE EE TO FF GG HH IIJJK
The grammar is currently ambiguous. On paper you cannot parse if ‘MOVE A TO B MOVE C TO D’ is two statements, or one statement with particular badly named destinations.
You have two answers. You may like neither.
You explicitly make your WORD not match any reserved word. That is, you specifically disallow matching MOVE or TO. This is equivalent to saying ‘MOVE is not a valid parameter name’. This makes ‘MOVE TL TO TM TN TO’ an error.
You modify your grammar so that you can tell where the statement ends. You could add commas ‘MOVE AA TO BB, CC MOVE TM TO TN, TO, TP’. You could add semi-colons or blank lines at the end of statements. You could require that MOVE be the least indented, like Python.