I just started using ParseKit to explore language creation and perhaps build a small toy DSL. However, the current SVN trunk from Google is throwing a -[PKToken intValue]: unrecognized selector sent to instance ... when parsing this grammar:
@start = identifier ;
identifier = (Letter | '_') | (letterOrDigit | '_') ;
letterOrDigit = Letter | Digit ;
Against this input:
foo
Clearly, I am missing something or have incorrectly configured my project. What can I do to fix this issue?
Developer of ParseKit here.
First, see the ParseKit Tokenization docs.
Basically, ParseKit can work in one of two modes: Let’s call them
Tokens ModeandChars Mode. (There are no formal names for these two modes, but perhaps there should be.)Tokens Modeis more popular by far. Virtually every example you will find of using ParseKit will show how to useTokens Mode. I believe all of the documentation on http://parsekit.com is usingTokens Mode. ParseKit’s grammar feature (that you are using in your example only works inTokens Mode).Chars Modeis a very little-known feature of ParseKit. I’ve never had anyone ask about it before.So the differences in the modes are:
Tokens Mode, the ParseKit Tokenizer emits multi-char tokens (like Words, Symbols, Numbers, QuotedStrings etc) which are then parsed by the ParseKit parsers you create (programmatically or via grammars).Chars Mode, the ParseKit Tokenizer always emits single-char tokens which are then parsed by the ParseKit parsers you create programmatically. (grammars don’t currently work with this mode as this mode is not popular).You could use
Chars Modeto implement Regular Expresions which parse on a char-by-char basis.For your example, you should be ignoring
Chars Modeand just useTokens Mode. The following Built-in Productions are forChars Modeonly. Do not use them in your grammars:Notice how all of those Productions sound like they match individual chars. That’s because they do.
Your example above should probably look like:
Keep in mind the Productions in your grammars (parsers like
identifier) will be working on Tokens already emitted from ParseKit’s Tokenizer. Not individual chars.IOW: by the time your grammar goes to work parsing input, the input has already been tokenized into Tokens of type Word, Number, Symbol, QuotedString, etc.
Here are all of the Built-in Productions available for use in your Grammar:
Also:
There are also operators for composite parsers: