Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6230971
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T09:50:54+00:00 2026-05-24T09:50:54+00:00

Due to the fact that Java code could be run in any Java VM

  • 0

Due to the fact that Java code could be run in any Java VM I’d like to know how is it possible to identify programmatically which Unicode version supported?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T09:50:56+00:00Added an answer on May 24, 2026 at 9:50 am

    The easiest way but worst way I can think of to do that would be to pick a code point that’d new to each Unicode release, and check its Character properties. Or you could check its General Category with a regex. Here are some selected code points:

    • Unicode 6.0.0:

      Ꞡ  U+A7A0 GC=Lu SC=Latin    LATIN CAPITAL LETTER G WITH OBLIQUE STROKE
      ₹  U+20B9 GC=Sc SC=Common   INDIAN RUPEE SIGN
      ₜ  U+209C GC=Lm SC=Latin    LATIN SUBSCRIPT SMALL LETTER T
      
    • Unicode 5.2:

      Ɒ  U+2C70 GC=Lu SC=Latin    LATIN CAPITAL LETTER TURNED ALPHA
      ‭⅐ U+2150 GC=No SC=Common   VULGAR FRACTION ONE SEVENTH
      ⸱  U+2E31 GC=Po SC=Common   WORD SEPARATOR MIDDLE DOT
      
    • Unicode 5.1:

      ‭ꝺ  U+A77A GC=Ll SC=Latin    LATIN SMALL LETTER INSULAR D
      Ᵹ  U+A77D GC=Lu SC=Latin    LATIN CAPITAL LETTER INSULAR 
      ⚼  U+26BC GC=So SC=Common    SESQUIQUADRATE
      
    • Unicode 5.0:

      Ⱶ  U+2C75 GC=Lu SC=Latin    LATIN CAPITAL LETTER HALF H
      ɂ  U+0242 GC=Ll SC=Latin    LATIN SMALL LETTER GLOTTAL STOP
      ⬔  U+2B14 GC=So SC=Common  SQUARE WITH UPPER RIGHT DIAGONAL HALF BLACK
      

    I’ve included the general category and the script property, although you can only inspect the script in JDK7, the first Java release that supports that.

    I found those code points by running commands like this from the command line:

    % unichars -gs '\p{Age=5.1}'
    % unichars -gs '\p{Lu}' '\p{Age=5.0}'
    

    Where that’s the unichars program. It will only find properties supported in the Unicode Character Database for whichever UCD version that the version of Perl you’re running supports.

    I also like my output sorted, so I tend to run

     % unichars -gs '\p{Alphabetic}' '\p{Age=6.0}' | ucsort | less -r
    

    where that’s the ucsort program, which sorts text according to the Unicode Collation Algorithm.

    However, in Perl unlike in Java this is easy to find out. For example, if you
    run this from the command line (yes, there’s a programmer API, too), you find:

    $ corelist -a Unicode
        v5.6.2     3.0.1     
        v5.8.0     3.2.0     
        v5.8.1     4.0.0 
        v5.8.8     4.1.0
        v5.10.0    5.0.0     
        v5.10.1    5.1.0 
        v5.12.0    5.2.0 
        v5.14.0    6.0.0
    

    That shows that Perl version 5.14.0 was the first one to support Unicode 6.0.0. For Java, I believe there is no API that gives you this information directly, so you’ll have to hardcode a table mapping Java versions and Unicode versions, or else use the empirical method of testing code points for properties. By empirically, I mean the equivalent of this sort of thing:

    % ruby -le 'print "\u2C75" =~ /\p{Lu}/ ? "pass 5.2" : "fail 5.2"'
    pass 5.2
    % ruby -le 'print "\uA7A0" =~ /\p{Lu}/ ? "pass 6.0" : "fail 6.0"'
    fail 6.0
    % ruby -v
    ruby 1.9.2p0 (2010-08-18 revision 29036) [i386-darwin9.8.0]
    
    % perl -le 'print "\x{2C75}" =~ /\p{Lu}/ ? "pass 5.2" : "fail 5.2"'
    pass 5.2
    % perl -le 'print "\x{A7A0}" =~ /\p{Lu}/ ? "pass 6.0" : "fail 6.0"'
    pass 6.0
    % perl -v
    This is perl 5, version 14, subversion 0 (v5.14.0) built for darwin-2level
    

    To find out the age of a particular code point, run uniprops -a on it like this:

    % uniprops -a 10424
    U+10424 ‹› \N{DESERET CAPITAL LETTER EN}
     \w \pL \p{LC} \p{L_} \p{L&} \p{Lu}
     All Any Alnum Alpha Alphabetic Assigned InDeseret Cased Cased_Letter LC Changes_When_Casefolded CWCF Changes_When_Casemapped CWCM Changes_When_Lowercased CWL Changes_When_NFKC_Casefolded CWKCF Deseret Dsrt Lu L Gr_Base Grapheme_Base Graph GrBase ID_Continue IDC ID_Start IDS Letter L_ Uppercase_Letter Print Upper Uppercase Word XID_Continue XIDC XID_Start XIDS X_POSIX_Alnum X_POSIX_Alpha X_POSIX_Graph X_POSIX_Print X_POSIX_Upper X_POSIX_Word
     Age=3.1 Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=Deseret Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered CCC=NR Canonical_Combining_Class=NR Decomposition_Type=None DT=None Script=Deseret East_Asian_Width=Neutral Grapheme_Cluster_Break=Other GCB=XX Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=AL Line_Break=Alphabetic LB=AL Numeric_Type=None NT=None Numeric_Value=NaN NV=NaN Present_In=3.1 IN=3.1 Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2 Present_In=6.0 IN=6.0 SC=Dsrt Script=Dsrt Sentence_Break=UP Sentence_Break=Upper SB=UP Word_Break=ALetter WB=LE Word_Break=LE _X_Begin
    

    All my Unicode tools are available in the Unicode::Tussle bundle, including unichars, uninames, uniquote, ucsort, and many more.

    Java 1.7 Improvements

    JDK7 goes a long way to making a few Unicode things easier. I talk about that a bit at the end of my OSCON Unicode Support Shootout talk. I had thought of putting together a table of which languages supports which versions of Unicode in which versions of those languages, but ended up scrapping that to tell people to just get the latest version of each language. For example, I know that Unicode 6.0.0 is supported by Java 1.7, Perl 5.14, and Python 2.7 or 3.2.

    JDK7 contains updates for classes Character, String, and Pattern in support of Unicode 6.0.0. This includes support for Unicode script properties, and several enhancements to Pattern to allow it to meet Level 1 support requirements for Unicode UTS#18 Regular Expressions. These include

    • The isupper and islower methods now correctly correspond to the Unicode uppercase and lowercase properties; previously they misapplied only to letters, which isn’t right, because it misses Other_Uppercase and Other_Lowercase code points, respectively. For example, these are some lowercase codepoints which are not GC=Ll (lowercase letters), selected samples only:

      % unichars -gs '\p{lowercase}' '\P{LL}'
      ◌ͅ  U+0345 GC=Mn SC=Inherited    COMBINING GREEK YPOGEGRAMMENI
      ͺ  U+037A GC=Lm SC=Greek        GREEK YPOGEGRAMMENI
      ˢ  U+02E2 GC=Lm SC=Latin        MODIFIER LETTER SMALL S
      ˣ  U+02E3 GC=Lm SC=Latin        MODIFIER LETTER SMALL X
      ᴬ  U+1D2C GC=Lm SC=Latin        MODIFIER LETTER CAPITAL A
      ᴮ  U+1D2E GC=Lm SC=Latin        MODIFIER LETTER CAPITAL B
      ᵂ  U+1D42 GC=Lm SC=Latin        MODIFIER LETTER CAPITAL W
      ᵃ  U+1D43 GC=Lm SC=Latin        MODIFIER LETTER SMALL A
      ᵇ  U+1D47 GC=Lm SC=Latin        MODIFIER LETTER SMALL B
      ₐ  U+2090 GC=Lm SC=Latin        LATIN SUBSCRIPT SMALL LETTER A
      ₑ  U+2091 GC=Lm SC=Latin        LATIN SUBSCRIPT SMALL LETTER E
      ⅰ  U+2170 GC=Nl SC=Latin        SMALL ROMAN NUMERAL ONE
      ⅱ  U+2171 GC=Nl SC=Latin        SMALL ROMAN NUMERAL TWO
      ⅲ  U+2172 GC=Nl SC=Latin        SMALL ROMAN NUMERAL THREE
      ⓐ  U+24D0 GC=So SC=Common       CIRCLED LATIN SMALL LETTER A
      ⓑ  U+24D1 GC=So SC=Common       CIRCLED LATIN SMALL LETTER B
      ⓒ  U+24D2 GC=So SC=Common       CIRCLED LATIN SMALL LETTER C
      
    • The alphabetic tests are now correct in that they use Other_Alphabetic. They did this wrong prior to 1.7, which is a problem.

    • The \x{HHHHH} pattern escape so you can meet RL1.1; this lets you rewrite [-] (which fails due to The UTF‐16 Curse) as [\x{1D49C}-\x{1D4B5}]. JDK7 is the first Java release that fully/correctly supports non-BMP characters in this regard. Amazing but true.

    • More properties for RL1.2, of which the script property is by far the most important. This lets you write \p{script=Greek} for example, abbreviated as \p{Greek}.

    • The new UNICODE_CHARACTER_CLASSES pattern compilation flag and corresponding pattern‐embeddable flag "(?U)" to meet RL1.2a on compatibility properties.

    I can certainly see why you want to make sure you’re running a Java with Unicode 6.0.0 support, since that comes with all those other benefits, too.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

This snippet throws an NullPointerException due to the fact that its unboxed to a
I noticed that I rarely use properties, due to the fact that I rarely
can use static methods/ classes safely in WCF due to the fact that WCF
I am having a design issue, this may be due to the fact that
Recently, I've started experimenting with Mercurial, due to the fact that it always attracted
TL;DR Version : This question has arisen due to the fact that I have
Due to various RMI exploits out there and the fact that I don't use
I have some Java code that takes an XML (SOAP) message and returns the
I have written a service which starts a java.exe or ruby.exe (I know there
Due to repetitive errors with one of our Java applications: Engine engine_0: Error in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.