I’m trying to parse a text file using parser combinators. I want to capture the index and text in a class called Example. Here’s a test showing the form on an input file:
object Test extends ParsComb with App {
val input = """
0)
blah1
blah2
blah3
1)
blah4
blah5
END
"""
println(parseAll(examples, input))
}
And here’s my attempt that doesn’t work:
import scala.util.parsing.combinator.RegexParsers
case class Example(index: Int, text: String)
class ParsComb extends RegexParsers {
def examples: Parser[List[Example]] = rep(divider~example) ^^
{_ map {case d ~ e => Example(d,e)}}
def divider: Parser[Int] = "[0-9]+".r <~ ")" ^^ (_.toInt)
def example: Parser[String] = ".*".r <~ (divider | "END")
}
It fails with:
[4.1] failure: `END' expected but `b' found
blah2
^
I’m just starting out with these so I don’t have much clue what I’m doing. I think the problem could be with the ".*".r regex not doing multi-line. How can I change this so that it parses correctly?
According to your grammar definition,
".*".r <~ (divider | "END"), you told to the parser that, anexampleshould followed either by adivideror aEND. After parsing blah1, the parser tried to finddividerand failed, then triedEND, failed again, there’re no other options available, so theENDhere was the last alternative of the production value, so from the parser’s perspective, it expectedEND, but it soon found, the next input wasblah2from the 4th line.Try to be close to your implementation, the grammar in your case should be:
and I think parsing “example” into
List[String]makes more sense, anyway, it’s up to you.The problem is your
exampleparser, it should be a repeatable literal.So ,
the regex
(?=[\\r\\n])means it’s a positive lookahead and would match characters that followed by\ror\n.the parse result is:
If you want to parse it into a String(instead of
List[String]), just add a transform function for example:^^ {_ mkString "\n"}