I took the example below partially from SO and changed it to my needs. It almost fits, but what I want to do is that always the first string in the commaSep expr is parsed as identifier whilst all subsequent strings should be strings only.
Currently they are all parsed as Identifiers.
*Parser> parse expr "" "rd (isFib, test2, 100.1, ?BOOL)"
Right (FuncCall "rd" [Identifier "isFib",Identifier "test2",Number 100.1,Query "?BOOL"])
I have tried a number of solutions that in the end all would break down to parsing the whole input without using commaSep. Means I would have to ignore the structure and do something like
expr_parse = do
name <- resvd_cmd
char '('
skipMany space
worker <- ident
char ','
skipMany1 space
args <- commaSep expr --not fully worked this out yet
query <- theQuery
skipMany space
char ')'
return (name, worker, args, query)
that looks less optimal and very clunky to me. Is there any way to refactor expr in the code below, achive what I need and keep it simple?
module Parser where
import Control.Monad (liftM)
import Text.Parsec
import Text.Parsec.String (Parser)
import Lexer
import AST
expr = ident <|> astring <|> number <|> theQuery <|> callOrIdent
astring = liftM String stringLiteral <?> "String"
number = liftM Number float <?> "Number"
ident = liftM Identifier identifier <?> "WorkerName"
questionm :: Parser Char
questionm = oneOf "?"
theQuery :: Parser AST
theQuery = do first <- questionm
rest <- many1 letter
let query = first:rest
return ( Query query )
resvd_cmd = do { reserved "rd"; return ("rd") }
<|> do { reserved "eval"; return ("eval") }
<|> do { reserved "read"; return ("read") }
<|> do { reserved "in"; return ("in") }
<|> do { reserved "out"; return ("out") }
<?> "LINDA-like Tuple"
callOrIdent = do
name <- resvd_cmd
liftM (FuncCall name)(parens $ commaSep expr) <|> return (Identifier name)
AST.hs
{-# LANGUAGE DeriveDataTypeable #-}
module AST where
import Data.Typeable
data AST
= Number Double
| Identifier String
| String String
| FuncCall String [AST]
| Query String
deriving (Show, Eq, Typeable)
Lexer.hs
module Lexer (
identifier, reserved, operator, reservedOp, charLiteral, stringLiteral,
natural, integer, float, naturalOrFloat, decimal, hexadecimal, octal,
symbol, lexeme, whiteSpace, parens, braces, angles, brackets, semi,
comma, colon, dot, semiSep, semiSep1, commaSep, commaSep1
)where
import Text.Parsec
import qualified Text.Parsec.Token as P
import Text.Parsec.Language (haskellStyle)
lexer = P.makeTokenParser ( haskellStyle
{P.reservedNames = ["rd", "in", "out", "eval", "take"]}
)
identifier = P.identifier lexer
reserved = P.reserved lexer
operator = P.operator lexer
reservedOp = P.reservedOp lexer
charLiteral = P.charLiteral lexer
stringLiteral = P.stringLiteral lexer
natural = P.natural lexer
integer = P.integer lexer
float = P.float lexer
naturalOrFloat = P.naturalOrFloat lexer
decimal = P.decimal lexer
hexadecimal = P.hexadecimal lexer
octal = P.octal lexer
symbol = P.symbol lexer
lexeme = P.lexeme lexer
whiteSpace = P.whiteSpace lexer
parens = P.parens lexer
braces = P.braces lexer
angles = P.angles lexer
brackets = P.brackets lexer
semi = P.semi lexer
comma = P.comma lexer
colon = P.colon lexer
dot = P.dot lexer
semiSep = P.semiSep lexer
semiSep1 = P.semiSep1 lexer
commaSep = P.commaSep lexer
commaSep1 = P.commaSep1 lexer
First, I’d like to introduce you to the function
lexemewhich alters a parser to eat trailing whitespace. You’re encouraged to use it rather than explicitly eating the whitespace. The difficulty is withcommaSepbecause it eats the,and then fails. It would be nice to write a less optimisticcommaSep, but let’s solve your problem directly.Let’s apply
lexemetocommaOne of the problems with your code was you were expecting it to see
test2asString "test2"but theastringparser expects its strings to begin and end with". Let’s make a parser for bald strings, but make sure they don’t start with?and don’t contain spaces or commas:The breakthrough came when I realised that because there has to be a query at the end, there was always a comma after a baldString:
Now let’s make a parser for one or more queries at the end of the tuple:
And now we can take the identifier, the baldStrings and the queries
finally giving
So you get
But if you want to lump the strings with the queries, you can
return (name,args++qs)at the end oftherest.Applicative is Less Ugly
I found it frustrating to be tied to the Monad interface, when there are lovely things like
<$>,<*>etc, so firstThen
Here
<$>is an infix version offmap, so(:)will be applied to the output ofnoneOf "? ,", giving a parser that returns something like('c':). This can then be applied to the output ofmany (noneOf " ,")using<*>to give the string we want.This one’s nice because we got the
<*>operator to ignore the output ofacommaand just return the output ofbaldString, using<*. If we wanted it the other way round, we could do*>, but you may as well use>>for that, which already ignores the output of the first parser.and
But wouldn’t it be nicer if we did
so we could do
which gives (with a little manual pretty-printing to get it into the width)
I also think your current
ASTis more of an abstract syntax store than an abstract syntax tree, and that you might get more milage from designing your own Tuple type and use that. Useand suchlike to ensure type safety, then roll them together into your Tuple type with a parser to generate them.