Need to implement syntax highlighting for COS aka MUMPS
for the language of a possible design of the form
new (new,set,kill)
set kill=new
where: ‘new’ and ‘set’ are commands, and also variable
grammar cos;
Command_KILL :( ('k'|'K') | ( ('k'|'K')('i'|'I')('l'|'L')('l'|'L') ) );
Command_NEW :( ('n'|'N') | ( ('n'|'N')('e'|'E')('w'|'W') ) );
Command_SET :( ('s'|'S') | ( ('s'|'S')('e'|'E')('t'|'T') ) );
INT : [0-9]+;
ID : [a-zA-Z][a-zA-Z0-9]*;
Space: ' ';
Equal: '=';
newCommand
: Command_NEW Space ID
;
setCommand
: Command_SET Space ID Space* Equal Space* INT
;
I have a problem, when ID like name as commands (NEW,SET e.t.c.)
According to the Wikipedia page, MUMPS doesn’t have reserved words:
Lexer rules like
Command_KILLfunction exactly like reserved words: they’re designed to make sure no other token is generated when input"kill"is encountered. So token typeCommand_KILLwill always be produced on"kill", even if it’s intended to be an identifier. You can keep the command lexer rules if you want, but you’ll have to treat them like IDs as well because you just don’t know what"kill"refers to based on the token alone.Making a MUMPS implementation in ANTLR means focusing on token usage and context rather than token types. Consider this grammar:
Parser rule
exprknows when anIDtoken is a command based on the layout of the entire line.ID ID, then the input is aCallExpr: the firstIDis a command name and the secondIDis a regular identifier.ID ID Equal ID, then the input is aSetExpr: the firstIDwill be a command (either"set"or something like it), the secondIDis the target identifier, and the thirdIDis the source identifier.Here’s a Java test application followed by a test case similar to the one mentioned in your question.
Input
Output
It’s up to the calling code to determine whether a command is valid in a given context. The parser can’t reasonably handle this because of MUMPS’s loose approach to commands and identifiers. But it’s not as bad as it may sound: you’ll know which commands function like a call and which function like a set, so you’ll be able to test the input from the
Listenerthat ANTLR produces. In the code above, for example, it would be very easy to test whether “set” was the command passed toexitSetExpr.Some MUMPS syntax may be more difficult to process than this, but the general approach will be the same: let the lexer treat commands and identifiers like
IDs, and use the parser rules to determine whether anIDrefers to a command or an identifier based on the context of the entire line.