The code below outputs Right ["1<!>2<!>3"], but I need Right ["1", "2", "3"].
import Text.ParserCombinators.Parsec
response = contents :: CharParser () [String]
where
contents = sepBy content contentDelimiter
contentDelimiter = string "<!>"
content = many anyChar
main = do
putStrLn $ show $ parse response "Response" "1<!>2<!>3"
I suppose the problem here is that the content parser consumes all the input before sepBy gets to test the delimiter. So, my questions are:
-
Am I correct with my assumption? If not, what is the mistake I’ve made?
-
What solution would you recommend for such a problem? (Using Parsec)
* content has to match any string not containing the delimiter. The 1<!>2<!>3 is just an example it can be dslkf\n><!>dsf<!>3 or whatever
For your first example, you would replace
with
So that the parser of the content doesn’t erroneously match the separator.
Maybe you want to match more than just digits but even so, I advise you to think carefully about what is valid between
<!>s and write a parser that does that.Why?
Once you’ve got a really good parser for content, your definition for response will be perfect. This way your content can include
mystring = "hello<!>mum"without being chopped by the top level parser – the low levelstringLiteralparser will eat the whole"hello<!>mum"and the top level parser will never see the<!>correctly and innocently included inside it.Generally,…
In most parsing situations it’s best to be really clear what’s allowed in your content, and parse only that, for three reasons:
Reusability is important. At the moment, if you use a parser that just splits on
<!>and eats everything else, it’s guaranteed to eat the whole input, and you won’t be able to do any more parsing.Bottom-up
Your parsers should work from the ground up – you described this very well in your comment as “stacking the parsers from specific to general”.
It’s easiest to write them in that order for ease of testing, so first write one that matches a
stringCharthenstringLiteralbeforememberbeforearraybeforeobjectbeforejsonbeforecontentthenresponse. You can have them calling each other recursively along the way. You can then useparseTestto test each little one as you do along; typingparseTest response "1<!>2<!>3"into ghci is quicker than rewriting main and compiling.Top-down?
It’s not wrong to write your parser top-down, just harder. You can write
but nothing is testable until you’ve written the very smallest parser.