I took the example below partially from SO and changed it to my needs.

Question

0

Asked: June 14, 20262026-06-14T22:40:32+00:00 2026-06-14T22:40:32+00:00

I took the example below partially from SO and changed it to my needs.

0

I took the example below partially from SO and changed it to my needs. It almost fits, but what I want to do is that always the first string in the commaSep expr is parsed as identifier whilst all subsequent strings should be strings only.

Currently they are all parsed as Identifiers.

*Parser> parse expr "" "rd (isFib, test2, 100.1, ?BOOL)"
Right (FuncCall "rd" [Identifier "isFib",Identifier "test2",Number 100.1,Query "?BOOL"])

I have tried a number of solutions that in the end all would break down to parsing the whole input without using commaSep. Means I would have to ignore the structure and do something like

expr_parse = do
    name <- resvd_cmd
    char '('
    skipMany space
    worker <- ident
    char ','
    skipMany1 space
    args <- commaSep expr --not fully worked this out yet
    query <- theQuery
    skipMany space
    char ')'
    return (name, worker, args, query)

that looks less optimal and very clunky to me. Is there any way to refactor expr in the code below, achive what I need and keep it simple?

module Parser where

import Control.Monad (liftM)
import Text.Parsec
import Text.Parsec.String (Parser)
import Lexer
import AST

expr = ident <|>  astring <|> number <|> theQuery <|> callOrIdent

astring = liftM String stringLiteral <?> "String"

number = liftM Number float <?> "Number"

ident = liftM Identifier identifier <?> "WorkerName"

questionm :: Parser Char
questionm = oneOf "?"

theQuery :: Parser AST
theQuery = do first <- questionm
              rest <- many1 letter
              let query = first:rest
              return ( Query query )

resvd_cmd = do { reserved "rd"; return ("rd") }
            <|> do { reserved "eval"; return ("eval") }
            <|> do { reserved "read"; return ("read") }
            <|> do { reserved "in"; return ("in") }
            <|> do { reserved "out"; return ("out") }
            <?> "LINDA-like Tuple"

callOrIdent = do
    name <- resvd_cmd
    liftM (FuncCall name)(parens $ commaSep expr) <|> return (Identifier name)

AST.hs

{-# LANGUAGE DeriveDataTypeable #-}

module AST where

import Data.Typeable

data AST
    = Number Double
    | Identifier String
    | String String
    | FuncCall String [AST]
    | Query String
    deriving (Show, Eq, Typeable)

Lexer.hs

module Lexer (
            identifier, reserved, operator, reservedOp, charLiteral, stringLiteral,
            natural, integer, float, naturalOrFloat, decimal, hexadecimal, octal,
            symbol, lexeme, whiteSpace, parens, braces, angles, brackets, semi,
            comma, colon, dot, semiSep, semiSep1, commaSep, commaSep1
    )where

import Text.Parsec
import qualified Text.Parsec.Token as P
import Text.Parsec.Language (haskellStyle)

lexer = P.makeTokenParser ( haskellStyle
                            {P.reservedNames = ["rd", "in", "out", "eval", "take"]}
                         )


identifier = P.identifier lexer
reserved = P.reserved lexer
operator = P.operator lexer
reservedOp = P.reservedOp lexer
charLiteral = P.charLiteral lexer
stringLiteral = P.stringLiteral lexer
natural = P.natural lexer
integer = P.integer lexer
float = P.float lexer
naturalOrFloat = P.naturalOrFloat lexer
decimal = P.decimal lexer
hexadecimal = P.hexadecimal lexer
octal = P.octal lexer
symbol = P.symbol lexer
lexeme = P.lexeme lexer
whiteSpace = P.whiteSpace lexer
parens = P.parens lexer
braces = P.braces lexer
angles = P.angles lexer
brackets = P.brackets lexer
semi = P.semi lexer
comma = P.comma lexer
colon = P.colon lexer
dot = P.dot lexer
semiSep = P.semiSep lexer
semiSep1 = P.semiSep1 lexer
commaSep = P.commaSep lexer
commaSep1 = P.commaSep1 lexer

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T22:40:33+00:00

First, I’d like to introduce you to the function lexeme which alters a parser to eat trailing whitespace. You’re encouraged to use it rather than explicitly eating the whitespace. The difficulty is with commaSep because it eats the , and then fails. It would be nice to write a less optimistic commaSep, but let’s solve your problem directly.

Let’s apply lexeme to comma

acomma = lexeme comma

One of the problems with your code was you were expecting it to see test2 as String "test2" but the astring parser expects its strings to begin and end with ". Let’s make a parser for bald strings, but make sure they don’t start with ? and don’t contain spaces or commas:

baldString = lexeme $ do
   x <- noneOf "? ,)"
   xs <- many (noneOf " ,)")   -- problematic - see comment below
   return . String $ x:xs

The breakthrough came when I realised that because there has to be a query at the end, there was always a comma after a baldString:

baldStringComma = do 
        s <- baldString
        acomma
        return s

Now let’s make a parser for one or more queries at the end of the tuple:

queries = commaSep1 (lexeme theQuery)

And now we can take the identifier, the baldStrings and the queries

therest = do
   name <- lexeme ident 
   acomma
   args <- many baldStringComma
   qs <- queries
   return (name,args,qs)

finally giving

tuple = do
    name <- lexeme resvd_cmd
    stuff <- parens therest
    return (name,stuff)

So you get

*Parser> parseTest tuple "rd (isFib, test2, 100.1, ?BOOL)"
("rd",(Identifier "isFib",[String "test2",String "100.1"],[Query "?BOOL"]))

But if you want to lump the strings with the queries, you can return (name,args++qs) at the end of therest.

Applicative is Less Ugly

I found it frustrating to be tied to the Monad interface, when there are lovely things like <$>, <*> etc, so first

import Control.Applicative hiding (many, (<|>))

Then

baldString = lexeme . fmap String $
   (:) <$> noneOf "? ,)"   
       <*> many (noneOf " ,)")   -- problematic - see comment below

Here <$> is an infix version of fmap, so (:) will be applied to the output of noneOf "? ,", giving a parser that returns something like ('c':). This can then be applied to the output of many (noneOf " ,") using <*> to give the string we want.

baldStringComma = baldString <* acomma

This one’s nice because we got the <*> operator to ignore the output of acomma and just return the output of baldString, using <*. If we wanted it the other way round, we could do *>, but you may as well use >> for that, which already ignores the output of the first parser.

therest = (,,) <$> 
   lexeme ident <* acomma
   <*> many baldStringComma
   <*> queries

and

tuple = (,) <$> lexeme resvd_cmd 
            <*> parens therest

But wouldn’t it be nicer if we did

data Tuple = Tuple {cmd :: String, 
                    id :: AST,
                    argumentList :: [AST],
                    queryList :: [AST]} deriving Show

so we could do

niceTuple = Tuple <$> lexeme resvd_cmd <* lexeme (char '(')
                  <*> lexeme ident <* acomma
                  <*> many baldStringComma
                  <*> queries <* lexeme (char ')')

which gives (with a little manual pretty-printing to get it into the width)

*Parser> parseTest niceTuple "rd (isFib, test2, 100.1, ?BOOL)"
Tuple {cmd = "rd", 
       id = Identifier "isFib", 
       argumentList = [String "test2",String "100.1"], 
       queryList = [Query "?BOOL"]}

I also think your current AST is more of an abstract syntax store than an abstract syntax tree, and that you might get more milage from designing your own Tuple type and use that. Use

newtype Command = Cmd String  deriving Show

and suchlike to ensure type safety, then roll them together into your Tuple type with a parser to generate them.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I took the example below partially from SO and changed it to my needs.

Leave an answerCancel reply

1 Answer

Applicative is Less Ugly

Leave an answer
Cancel reply