Example String
023abc7defghij
Header
Characters 0, 1 = Size of following chunks
Chunks
First character = length of following string String
Following characters = String with the specified length
Example result
So in the upper example this would mean:
02 -> 2 following chunks
3 -> 3 character String will follow
abc -> the three character string
7 -> 7 character String will follow
defghij -> the seven character string
Question
Can I write a grammar, that describes this form of a string?
I would need to interpret the ‘length’ informations and then build tokens with the specified lenght to fill my objects with the length informations and the strings.
I hope I could describe this comprehensible. I could not find information, describing or solving my problem.
I’m assuming your actual problem is a bit more complicated, because if
"023abc7defghij"is your actual input, I wouldn’t use a parser generator like ANTLR, but just stick with some simple string-operations.That said, here’s a possible solution:
Since your
chunksare not known up front, you cannot create any tokens other than a singleDigitand anOthertoken that would be any char other than a digit. Note that you don’t really need theheaderinformation: you simply parse"3"and then get the next 3 chars, then parse the"7"and get the next 7 chars, … all the way up to the end of the file.A grammar for such a language could look like this:
But now the
chunkrule is ambiguous: it does not now when to stop consuming characters. This can be done using a gated semantic predicate that will cause the*fromany*to stop consuming when a certain condition has been met (when a counterint nhas been counted down, in this case).The grammar above including this predicate and some
println-statements would look like this:which can be tested with the class:
If you now generate a lexer and parser, compile all
.javafile and run theMainclass:you would see the following being printed to your console: