Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7856619
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T20:45:50+00:00 2026-06-02T20:45:50+00:00

Some compilers failed on non-ASCII characters in JavaDoc and source code comments. What is

  • 0

Some compilers failed on non-ASCII characters in JavaDoc and source code comments. What is the current (Java 7) and future (Java 8 and beyond) practices with respect to Unicode in Java source files? Are there differences between IcedTea, OpenJDK, and other Java environments, and what is dictated the the language specification? Should all non-ASCII characters be escaped in JavaDoc with HTML &escape;-like codes? But what would be the Java // comment equivalent?

Update: comments indicate that one can use any character set, and that upon compiling one needs to indicate what char set is used in the source file. I will look into this, and will be looking for details on how to configure this via Ant, Eclipse, and Maven.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T20:45:52+00:00Added an answer on June 2, 2026 at 8:45 pm

    Some compilers failed on non-ASCII characters in JavaDoc and source code comments.

    This is likely because the compiler assumes that the input is UTF-8, and there are invalid UTF-8 sequences in the source file. That these appear to be in comments in your source code editor is irrelevant because the lexer (which distinguishes comments from other tokens) never gets to run. The failure occurs while the tool is trying to convert bytes into chars before the lexer runs.


    The man page for javac and javadoc say

    -encoding name
              Specifies  the  source  file  encoding   name,   such   as
              EUCJIS/SJIS.   If  this option is not specified, the plat-
              form default converter is used.
    

    so running javadoc with the encoding flag

    javadoc -encoding <encoding-name> ...
    

    after replacing <encoding-name> with the encoding you’ve used for your source files should cause it to use the right encoding.

    If you’ve got more than one encoding used within a group of source files that you need to compile together, you need to fix that first and settle on a single uniform encoding for all source files. You should really just use UTF-8 or stick to ASCII.


    What is the current (Java 7) and future (Java 8 and beyond) practices with respect to Unicode in Java source files?

    The algorithm for dealing with a source file in Java is

    1. Collect bytes
    2. Convert bytes to chars (UTF-16 code units) using some encoding.
    3. Replace all sequences of '\\' 'u' followed by four hex digits with the code-unit corresponding to those hex-digits. Error out if there is a "\u" not followed by four hex digits.
    4. Lex the chars into tokens.
    5. Parse the tokens into classes.

    The current and former practice is that step 2, converting bytes to UTF-16 code units, is up to the tool that is loading the compilation unit (source file) but the de facto standard for command line interfaces is to use the -encoding flag.

    After that conversion happens, the language mandates that \uABCD style sequences are converted to UTF-16 code units (step 3) before lexing and parsing.

    For example:

    int a;
    \u0061 = 42;
    

    is a valid pair of Java statements.
    Any java source code tool must, after converting bytes to chars but before parsing, look for \uABCD sequences and convert them so this code is converted to

    int a;
    a = 42;
    

    before parsing. This happens regardless of where the \uABCD sequence occurs.

    This process looks something like

    1. Get bytes: [105, 110, 116, 32, 97, 59, 10, 92, 117, 48, 48, 54, 49, 32, 61, 32, 52, 50, 59]
    2. Convert bytes to chars: ['i', 'n', 't', ' ', 'a', ';', '\n', '\\', 'u', '0', '0', '6', '1', ' ', '=', ' ', '4', '2', ';']
    3. Replace unicode escapes: ['i', 'n', 't', ' ', 'a', ';', '\n', a, ' ', '=', ' ', '4', '2', ';']
    4. Lex: ["int", "a", ";", "a", "=", "42", ";"]
    5. Parse: (Block (Variable (Type int) (Identifier "a")) (Assign (Reference "a") (Int 42)))

    Should all non-ASCII characters be escaped in JavaDoc with HTML &escape;-like codes?

    No need except for HTML special characters like '<' that you want to appear literally in the documentation. You can use \uABCD sequences inside javadoc comments.
    Java process \u.... before parsing the source file so they can appear inside strings, comments, anywhere really. That’s why

    System.out.println("Hello, world!\u0022);
    

    is a valid Java statement.

    /** @return \u03b8 in radians */
    

    is equivalent to

    /** @return θ in radians */
    

    as far as javadoc is concerned.


    But what would be the Java // comment equivalent?

    You can use // comments in java but Javadoc only looks inside /**...*/ comments for documentation. // comments are not metadata carrying.

    One ramification of Java’s handling of \uABCD sequences is that although

    // Comment text.\u000A System.out.println("Not really comment text");
    

    looks like a single line comment, and many IDEs will highlight it as such, it is not.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

What is the reason for the following warning in some C++ compilers? No newline
What are some suggestions for easy to use C++ compilers for a beginner? Free
Can you please give me some comparison between C compilers especially with respect to
I have some code that compiles and runs on MSVC++ but will not compile
I have added some code which compiles cleanly and have just received this Windows
I am rewriting some code from having a regular pointer to where the pointer
I am just upgrading some old code written in Delphi 6 to Delphi XE2.
I've got a peculiar error writing some C++/CLI code. I'm trying to make a
I'm attempting to utilize some code found here: http://androidtabs.googlecode.com/svn/trunk/ This includes the bases classes
In response to .. some other question somewhere, I wrote this code. struct no_type{};

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.