I’m writing lexical specification for JFlex (it’s like flex, but for Java). I have problem with TraditionalComment (/* */) and DocumentationComment (/** */). So far I have this, taken from JFlex User’s Manual:
LineTerminator = \r|\n|\r\n
InputCharacter = [^\r\n]
WhiteSpace = {LineTerminator} | [ \t\f]
/* comments */
Comment = {TraditionalComment} | {EndOfLineComment} | {DocumentationComment}
TraditionalComment = "/*" [^*] ~"*/" | "/*" "*"+ "/"
EndOfLineComment = "//" {InputCharacter}* {LineTerminator}
DocumentationComment = "/**" {CommentContent} "*"+ "/"
CommentContent = ( [^*] | \*+ [^/*] )*
{Comment} { /* Ignore comments */ }
{LineTerminator} { return LexerToken.PASS; }
LexerToken.PASS means that later I’m passing line terminators on output. Now, what I want to do is:
Ignore everything which is inside the comment, except new line terminators.
For example, consider such input:
/* Some
* quite long comment. */
In fact it is /* Some\n * quite long comment. */\n. With current lexer it will be converted to a single line. The output will be single ‘\n’. But I would like to have 2 lines, ‘\n\n’. In general, I would like that my output will always have the same number of lines as input. How to do it?
After couple of days I found a solution. I will post it here, maybe somebody will have the same problem.
The trick is, after recognizing that you are inside a comment – go once more through its body and if you spot new line terminators – pass them, not ignore: