I know ‘+’, ‘?’ and ‘*’. But what if I want something repeats itself for, say, 5 times? For example, if an identifier must be a string of hexdecimal numbers of length 5?
To be more specific, I’m thinking about define a general lexer rule of unlimited length, and then, at parsing time count how many time it repeated, if it equals to 5, then rename it as another type of token, but how can I do this? Or is there some easy way?
Yes, you can do that with a disambiguating semantic predicate (explanation):
which will parse the input
12345 12345678as follows:But you can also change the type of the token in the lexer based on some property of the matched text, like this:
which will cause the same input to be parsed like this: