I’m currently making a scanner for a basic compiler I’m writing in Haskell. One of the requirements is that any character enclosed in single quotes (‘) is translated into a character literal token (type T_Char), and this includes escape sequences such as ‘\n’ and ‘\t’. I’ve defined this part of the scanner function which works okay for most cases:
scanner ('\'':cs) | (length cs) == 0 = error "Illegal character!"
| head cs == '\\' = mkEscape (head (drop 1 cs)) : scanner (drop 3 cs)
| head (drop 1 cs) == '\'' = T_Char (head cs) : scanner (drop 2 cs)
where
mkEscape :: Char -> Token
mkEscape 'n' = T_Char '\n'
mkEscape 'r' = T_Char '\r'
mkEscape 't' = T_Char '\t'
mkEscape '\\' = T_Char '\\'
mkEscape '\'' = T_Char '\''
However, this comes up when I run it in GHCi:
Main> scanner "abc '\\' def"
[T_Id "abc", T_Char '\'', T_Id "def"]
It can recognise everything else but gets escaped backslashes confused with escaped single quotes. Is this something to do with character encodings?
I don’t think there’s anything wrong with the parser regarding your problem. To Haskell, the string will be read as
because Haskell also has string escapes. So when it reaches the first quotation mark,
cscontains the char sequence\' def. Obviouslyhead csis a backslash, so it will runmkEscape.The argument given is
head (drop 1 cs), which is', thusmkEscapewill returnT_Char '\'', which is what you saw.Perhaps you should call
The 1st level of
\is for the Haskell interpreter, and the 2nd level is forscanner.