I’m learning to write a simple parser-combinator. I’m writing the rules from bottom up and write unit-tests to verify as I go. However, I’m blocked at using repsep() with whitespace as the separator.
object MyParser extends RegexParsers {
lazy val listVal:Parser[List[String]]=elem('{')<~repsep("""\d+""".r,"""\s+""".r)~>elem('}')
}
The rule was simplified to illustrate the problem. When I feed the parser with “{1 2 3}”, it always complains that it doesn’t match:
[1.4] failure: `}’ expected but 2 found
I’m wondering what’s the correct way of writing a rule as I described?
Thanks
By default,
RegexParsers-derived parsers skip whitespace before attempting to match any terminal symbol. Unless your whitespace interpretation is unusual, you can just work with that. If the particular character (sequences) you wish to treat as ignored whitespace is something other than the default (\s+), you can override the projectedval whiteSpace: Regex = ...value in yourRegexParsersparser. If you do not what any such whitespace skipping to occur,override def skipWhitespace = false.Edit: So yes, changing this:
to this:
and leaving everything else defined in
RegexParsersunchanged should do what you want.By the way, the common use of
repsepis for things like comma-separated lists where you need to ensure the commas are there but don’t need to keep them in the resulting parse tree (or AST).