I came across the following code:
public class LinePrinter {
public static void main(String args[]) {
//Note: \u000A is unicode for Line Feed
char c=0x000A;
System.out.println(c);
}
}
This doesn’t compile due to the Unicode replacement done.
The question is, why doesn’t the comment (//) override Unicode replacement done by the compiler? I thought the compiler should ignore the comments first before doing anything else with the code translation.
EDIT:
Not sure if the above is clear enough.
I know what happens with the above and why it errors out. My expectation is that the compiler should ignore all the commented lines before doing any translation with the code. Obviously that’s not the case here. I am expecting a rationale for this behaviour.
The specification states that a Java compiler must convert Unicode escapes to their corresponding characters before doing anything else, to allow for things like non-ASCII characters in identifiers to be protected (via
native2ascii) when the code is stored or sent over a channel that is not 8-bit clean.This rule applies globally, in particular you can even escape comment markers using Unicode escapes. For example the following two snippets are identical:
If the compiler were to try and remove comments before handling Unicode escapes it would end up stripping everything from the
/*, etc.to thehandle("/*", "*/, leavingwhich would then be unescaped to one single line comment, and then removed at the next stage of parsing. Thus generating no compiler error or warning but silently dropping a whole line of code…