I am using the identifier parser from FParsec to parse the names of variables and functions, which are normally a mixture of Unicode and ASCII characters. But sometimes I have escaped Unicode characters in the beginning (like \u03C0) or within the identifier (like swipe_board\u003A_b). I still can make them parseable using isAsciiIdStart and isAsciiIdContinue options, but I can’t define my own custom function for pre-processing before normalization. What could be a solution here?
I am using the identifier parser from FParsec to parse the names of variables
Share
The
identifierparser internally first parses a string and then passes it to anIdentifierValidatorinstance for validation. Since the C#IdentifierValidatorclass is publicly accessible (though not documented), you could easily adapt theidentifierparser to your needs (by making the initial string parsing step also recognize the escapes).The identifier parsing is a bit complicated due to support for UTF-16 surrogate pairs, normalization and the Unicode XID character category, which is not natively supported on .NET.
Maybe you only need to support ASCII or UCS-2 identifiers specified in term of character categories supported by
CharUnicodeInfo.GetUnicodeCategory, in which case you could probably implement the parsing and validation in just one step usingmany1Satisfy2ormany1Chars2.