I’m interested in adding semi-colon insertion ala Google Go to my flex file.
From the Go documentation:
Semicolons
Like C, Go’s formal grammar uses semicolons to terminate statements;
unlike C, those semicolons do not appear in the source. Instead the
lexer uses a simple rule to insert semicolons automatically as it
scans, so the input text is mostly free of them.The rule is this. If the last token before a newline is an identifier
(which includes words like int and float64), a basic literal such as a
number or string constant, or one of the tokensbreak continue fallthrough return ++ -- ) }the lexer always inserts a semicolon after the token. This could be
summarized as, “if the newline comes after a token that could end a
statement, insert a semicolon”.A semicolon can also be omitted immediately before a closing brace, so
a statement such asgo func() { for { dst <- <-src } }()needs no semicolons. Idiomatic Go programs have semicolons only in
places such as for loop clauses, to separate the initializer,
condition, and continuation elements. They are also necessary to
separate multiple statements on a line, should you write code that
way.One caveat. You should never put the opening brace of a control
structure (if, for, switch, or select) on the next line. If you do, a
semicolon will be inserted before the brace, which could cause
unwanted effects. Write them like thisif i < f() { g() }not like this
if i < f() // wrong! { // wrong! g() // wrong! } // wrong!
How would I go about doing this (how can I insert tokens in the stream, how can I see the last token that was matched to see if it is a good idea, etc etc etc)?
I am using bison too, but Go seems to just use their lexer for semicolon insertion.
You could pass lexer result tokens through a function that inserts semicolons where necessary. Upon detection of the need to insert, the next token can be put back to the input stream, basically lexing it again in the next turn.
Below is an example that inserts a SEMICOLON before a newline, when it follows a WORD. The bison file “insert.y” is this:
and the lexer is generated by flex from this:
For input
it prints
Unputting a non-constant token requires a little extra work – I have tried to keep the example simple, just to give the idea.