I have worked my way through the Haskell Koans provided here:
https://github.com/roman/HaskellKoans
I am stuck on the last two Koans, both involving parsing custom algebraic data types. Here is the first:
data Atom = AInt Int | ASym Text deriving (Eq, Show)
testAtomParser :: Test
testAtomParser = testCase "atom parser" $ do
-- Change parser with the correct parser to use
--
let parser = <PARSER HERE> :: P.Parser Atom
assertParse (ASym "ab") $ P.parseOnly parser "ab"
assertParse (ASym "a/b") $ P.parseOnly parser "a/b"
assertParse (ASym "a/b") $ P.parseOnly parser "a/b c"
assertParse (AInt 54321) $ P.parseOnly parser "54321"
How can define the variable parser such that it can parse the algebraic datatype Atom to pass the assertions?
I.
Parsers of an ADT tend to reflect the shape of the ADT. Your ADT is formed of two disjoint parts, so your parser probably has two disjoint parts as well
II.
Assuming we know how to parse a single digit (let’s call that basic parser
digit) then we parse a (non-negative) integer by just repeating it.this successfully parses an infinite stream of digits and throws them away. Can we do better? Not with just a monad instance, unfortunately, we need another basic combinator,
many, which modifies some other parser to consume input 0 or more times, accumulating the results into a list. We’ll actually adjust this slightly since an empty parse isn’t a valid numberIII.
What about atoms? To pass the test cases, it appears that an atom must be 1-to-many alphanumeric characters or backslashes. Again, this disjoint structure can be immediately expressed in our parser
We’ll again use some built-in simple parser combinators to build up what we want, say
satisfy :: (Char -> Bool) -> Parser Charwhich matches any character which satisfies some predicate. We can immediately build another useful combinator,char c = satisfy (==c) :: Char -> Parser Charand then we’re done.where
isAlphais a predicate much like the regex[a-zA-Z].IV.
So now we have the core of our parser
the
many1combinators lift our character parsers into parsers of lists of characters (Strings!). This lifting action is the basic idea for building ADT parsers, too. We want to lift ourParser Stringup intoParser Atom. One way to do it would be to use a functiontoAtom :: String -> Atomwhich we could thenfmapinto theParserbut a function with type
String -> Atomdefeats the purpose of building a parser in the first place.As stated in I. the important part is that the shape of the ADT is reflected in the shape of our
atomparser. We’ll need to take advantage of that to build our final parser.V.
We need to take advantage of information in the structure of our
atomparser. Let’s instead build two functionseach of which stating both a method of turning
Strings intoAtoms but also declaring what kind ofAtomwe’re dealing with. It’s worth noting thatliftIntwill throw a runtime error if we pass it a string that cannot be parsed into anInt. Fortunately, that’s exactly what we know we have.Now our
atom''parser takes advantage of the guarantee thatnaturalwill only return strings which are valid parses for a natural—our call toreadwill not fail!—and we try to build bothAIntandASymin order, trying one after another in a disjoint structure just like the structure of our ADT.VI.
The whole shebang is thus
which shows the fun of parser combinators. The whole thing is built up from the ground using tiny, composable, simple pieces. Each one does a very tiny job but all together they span a large space of parsers.
You can also easily augment this grammar with more branches in your ADT, a more thoroughly specified symbol type parser, or failure decorations with
<?>so that you have great error messages on failed parses.