Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6591543
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T17:26:11+00:00 2026-05-25T17:26:11+00:00

I read this blogentry regarding perl and how they handle unicode and normalization of

  • 0

I read this blogentry regarding perl and how they handle unicode and normalization of unicode.
Short version, as I understand it, is that there are several ways to write the identifier “é” in unicode. Either as one unicode character or as a combination of two character. And the perl program may not be able to distinguish between them causing strange errors.

So that got me thinking, how does the Java editor in Eclipse handle unicode? Or java in general, since I guess thats the same question.

On one hand the specification says:

Two identifiers are the same only if they are identical, that is, have the same Unicode character for each letter or digit.

But on the other, the unicode chars are translated:

This translation step allows any program to be expressed using only ASCII characters.

This seems to contradict each other?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T17:26:12+00:00Added an answer on May 25, 2026 at 5:26 pm

    The translation step refers to the first step of the lexical translation process:

    A translation of Unicode escapes (§3.3) in the raw stream of Unicode characters to the corresponding Unicode character. A Unicode escape of the form \uxxxx, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.

    The lexical translation process allows Unicode characters to be specified in your source code as escape sequences having ASCII characters alone. It is thereby possible for one to name an identifier with valid Unicode characters but represented in ASCII using an Unicode escape sequence.

    The translation of escape sequences occurs before the compiler is invoked to produce the bytecode; it is the compiler that verifies whether two identifiers are alike, irrespective of how they are represented in code. The compiler is provided with a normalized sequence of input characters and line terminators, and the rules for naming identifiers are applied against this sequence. Therefore, the following code will not compile, and will produce an error, as the identifiers have the same name, despite one being represented differently:

    package info.example.i18n;
    
    public class UnicodeEscape
    {
        int a;
        int \u0061; // Hex(61) = Dec(97) = 'a' in ASCII-7
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I read this article, it suggests (page 1025 last paragraph) that there is a
I read this answer and its comments and I'm curious: Are there any reasons
I read this story on slashdot today where they announce a new parallel programming
I read this question here: Is there a way to override the empty constructor
I read this post but I don't really understand the code... I have a
Having read this page , I can't believe that VB.Net has such a terrible
Having read this past question for git, I would like to ask if there
I read this question and its answer in a book. But I didn't understand
I recently read this blog entry by Hadi Hariri: That dreaded M in ASP.NET
My program has to read files that use various encodings. They may be ANSI,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.