Im parsing a SCPI string, which looks something like:
HEADER:HEADER:HEADER:CMD NUMBER MULTIPLIER UNIT;
The spaces between the tokens NUMBER, MULTIPLIER and UNIT are not necessarily there, nor are the tokens of a fixed length. I have been able to parse (from L to R) as far as the end of NUMBER. However the MULTIPLIER and UNIT tokens are each optional and can have characters that are the same.
e.g. suffix could be ‘P’ (where P could mean pico [mult] or poise [unit])
or ‘MA’ (could be mega [mult] or milli-Amp [mult-unit])
Does anyone have any experience parsing such syntax’s, or indeed anyone else, have any ideas on how to parse these into their correct tokens.
EDIT: For the pedant, I guess this is more lexical analysis than parsing.
Perhaps in your simple example, doing it with a couple of nested
ifs would be easier than trying a more powerful method, but if you don’t want to do that manually or if the actual problem is a bit bigger, you can try matching your input with regular expression (standard lexer stuff).On a POSIX system, you can use
regexec.Edit: How to do it with
if(andselect):I assume your input is in
textand you have already read up to the end of NUMBER, so your indexi, shows that!Note, in this case, I assumed UNIT is mandatory. I’m not sure how you can distinguish between mega and milliamp in 10MA if both MULT and UNIT are optional. However you can add more cases to the first
switch, that correspond to values ofMULTand changepower_of_10there to. For example, if in the firstswitchyou seek, you can understand thatUNITdoesn’t exist andpower_of_10is 3.