Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7702213
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T23:10:37+00:00 2026-05-31T23:10:37+00:00

Why Unicode has several reserved character codes? See the Unicode for two languages- Kannada

  • 0

Why Unicode has several reserved character codes?
See the Unicode for two languages- Kannada and Tamil.
Both language are very old and I think there is no chance to get new characters to these languages.
EDIT: Then why are they wasting some character codes by making it reserved character codes?
Why are they not placing the reserved character codes at the end of each language character set?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T23:10:38+00:00Added an answer on May 31, 2026 at 11:10 pm

    This has to do with how the Unicode consortium doles out its allocated blocks, scripts, and code points. For example, in Block=Tamil, the start of it runs this way:

    $ unichars '\p{Block=Tamil}' | head -20
    U+00B82 ‭ ◌ஂ  GC=Mn SC=Tamil        TAMIL SIGN ANUSVARA
    U+00B83 ‭ ஃ  GC=Lo SC=Tamil        TAMIL SIGN VISARGA
    U+00B85 ‭ அ  GC=Lo SC=Tamil        TAMIL LETTER A
    U+00B86 ‭ ஆ  GC=Lo SC=Tamil        TAMIL LETTER AA
    U+00B87 ‭ இ  GC=Lo SC=Tamil        TAMIL LETTER I
    U+00B88 ‭ ஈ  GC=Lo SC=Tamil        TAMIL LETTER II
    U+00B89 ‭ உ  GC=Lo SC=Tamil        TAMIL LETTER U
    U+00B8A ‭ ஊ  GC=Lo SC=Tamil        TAMIL LETTER UU
    U+00B8E ‭ எ  GC=Lo SC=Tamil        TAMIL LETTER E
    U+00B8F ‭ ஏ  GC=Lo SC=Tamil        TAMIL LETTER EE
    U+00B90 ‭ ஐ  GC=Lo SC=Tamil        TAMIL LETTER AI
    U+00B92 ‭ ஒ  GC=Lo SC=Tamil        TAMIL LETTER O
    U+00B93 ‭ ஓ  GC=Lo SC=Tamil        TAMIL LETTER OO
    U+00B94 ‭ ஔ  GC=Lo SC=Tamil        TAMIL LETTER AU
    U+00B95 ‭ க  GC=Lo SC=Tamil        TAMIL LETTER KA
    U+00B99 ‭ ங  GC=Lo SC=Tamil        TAMIL LETTER NGA
    U+00B9A ‭ ச  GC=Lo SC=Tamil        TAMIL LETTER CA
    U+00B9C ‭ ஜ  GC=Lo SC=Tamil        TAMIL LETTER JA
    U+00B9E ‭ ஞ  GC=Lo SC=Tamil        TAMIL LETTER NYA
    U+00B9F ‭ ட  GC=Lo SC=Tamil        TAMIL LETTER TTA
    

    They tend to reserve contiguous rows of 4, 8, or 16 code points to all the same “kind” of character. Yes, there are gaps there, but it’s like how in the filesystem, once you allocate a sector (or block if you don’t have separate sectors within a block) to one file, even if that file doesn’t use everything in its (final) sector, you don’t go giving away those unused byte to some other process. Things tend to get padded to block boundaries anyway.

    It’s not like we’re at any risk of running out of codes.

    Here is the beginning of the allocated area starts with “Signs”, as shown by the first assigned code points in that block. The gap may represent a change from one kind of character to another. If you check out the first five code points in the block for their properties, you see that those unassigned code points still have the right block property:

    $ uniprops -a U+00B80 U+00B81 U+00B82 U+00B83 U+00B84 U+00B85
    U+0B80 ‹U+0B80› \N{U+0B80}
        \pC \p{Cn}
        All Any InTamil C Other Cn Unassigned Zzzz Unknown
        Age=Unassigned Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=Tamil Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered
           CCC=NR Canonical_Combining_Class=NR Decomposition_Type=None DT=None East_Asian_Width=Neutral Grapheme_Cluster_Break=Other GCB=XX
           Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group
           JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=Unknown LB=XX Line_Break=XX Numeric_Type=None NT=None
           Numeric_Value=NaN NV=NaN Present_In=Unassigned IN=Unassigned Script=Unknown SC=Zzzz Script=Zzzz Sentence_Break=Other SB=XX
           Sentence_Break=XX Word_Break=Other WB=XX Word_Break=XX
    U+0B81 ‹U+0B81› \N{U+0B81}
        \pC \p{Cn}
        All Any InTamil C Other Cn Unassigned Zzzz Unknown
        Age=Unassigned Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=Tamil Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered
           CCC=NR Canonical_Combining_Class=NR Decomposition_Type=None DT=None East_Asian_Width=Neutral Grapheme_Cluster_Break=Other GCB=XX
           Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group
           JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=Unknown LB=XX Line_Break=XX Numeric_Type=None NT=None
           Numeric_Value=NaN NV=NaN Present_In=Unassigned IN=Unassigned Script=Unknown SC=Zzzz Script=Zzzz Sentence_Break=Other SB=XX
           Sentence_Break=XX Word_Break=Other WB=XX Word_Break=XX
    U+0B82 ‹◌ஂ› \N{TAMIL SIGN ANUSVARA}
        \w \pM \p{Mn}
        All Any Alnum Alpha Alphabetic Assigned InTamil Tamil Is_Tamil Case_Ignorable CI M Mn Gr_Ext Grapheme_Extend Graph GrExt ID_Continue IDC
           Mark Nonspacing_Mark Print Taml Word XID_Continue XIDC X_POSIX_Alnum X_POSIX_Alpha X_POSIX_Graph X_POSIX_Print X_POSIX_Word
        Age=1.1 Bidi_Class=Nonspacing_Mark BC=NSM Bidi_Class=NSM Block=Tamil Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered
           CCC=NR Canonical_Combining_Class=NR Decomposition_Type=None DT=None East_Asian_Width=Neutral Grapheme_Cluster_Break=EX
           Grapheme_Cluster_Break=Extend GCB=EX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group
           JG=NoJoiningGroup Joining_Type=T Joining_Type=Transparent JT=T Line_Break=CM Line_Break=Combining_Mark LB=CM Numeric_Type=None NT=None
           Numeric_Value=NaN NV=NaN Present_In=1.1 IN=1.1 Present_In=2.0 IN=2.0 Present_In=2.1 IN=2.1 Present_In=3.0 IN=3.0 Present_In=3.1 IN=3.1
           Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2
           Present_In=6.0 IN=6.0 Script=Tamil SC=Taml Script=Taml Sentence_Break=EX Sentence_Break=Extend SB=EX Word_Break=Extend WB=Extend
    U+0B83 ‹ஃ› \N{TAMIL SIGN VISARGA}
        \w \pL \p{L_} \p{Lo}
        All Any Alnum Alpha Alphabetic Assigned InTamil Tamil Is_Tamil L Lo Gr_Base Grapheme_Base Graph GrBase ID_Continue IDC ID_Start IDS Letter
           L_ Other_Letter Print Taml Word XID_Continue XIDC XID_Start XIDS X_POSIX_Alnum X_POSIX_Alpha X_POSIX_Graph X_POSIX_Print X_POSIX_Word
        Age=1.1 Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=Tamil Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered CCC=NR
           Canonical_Combining_Class=NR Decomposition_Type=None DT=None East_Asian_Width=Neutral Grapheme_Cluster_Break=Other GCB=XX
           Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group
           JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=AL Line_Break=Alphabetic LB=AL Numeric_Type=None NT=None
           Numeric_Value=NaN NV=NaN Present_In=1.1 IN=1.1 Present_In=2.0 IN=2.0 Present_In=2.1 IN=2.1 Present_In=3.0 IN=3.0 Present_In=3.1 IN=3.1
           Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2
           Present_In=6.0 IN=6.0 Script=Tamil SC=Taml Script=Taml Sentence_Break=LE Sentence_Break=OLetter SB=LE Word_Break=ALetter WB=LE
           Word_Break=LE
    U+0B84 ‹U+0B84› \N{U+0B84}
        \pC \p{Cn}
        All Any InTamil C Other Cn Unassigned Zzzz Unknown
        Age=Unassigned Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=Tamil Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered
           CCC=NR Canonical_Combining_Class=NR Decomposition_Type=None DT=None East_Asian_Width=Neutral Grapheme_Cluster_Break=Other GCB=XX
           Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group
           JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=Unknown LB=XX Line_Break=XX Numeric_Type=None NT=None
           Numeric_Value=NaN NV=NaN Present_In=Unassigned IN=Unassigned Script=Unknown SC=Zzzz Script=Zzzz Sentence_Break=Other SB=XX
           Sentence_Break=XX Word_Break=Other WB=XX Word_Break=XX
    U+0B85 ‹அ› \N{TAMIL LETTER A}
        \w \pL \p{L_} \p{Lo}
        All Any Alnum Alpha Alphabetic Assigned InTamil Tamil Is_Tamil L Lo Gr_Base Grapheme_Base Graph GrBase ID_Continue IDC ID_Start IDS Letter
           L_ Other_Letter Print Taml Word XID_Continue XIDC XID_Start XIDS X_POSIX_Alnum X_POSIX_Alpha X_POSIX_Graph X_POSIX_Print X_POSIX_Word
        Age=1.1 Bidi_Class=L Bidi_Class=Left_To_Right BC=L Block=Tamil Canonical_Combining_Class=0 Canonical_Combining_Class=Not_Reordered CCC=NR
           Canonical_Combining_Class=NR Decomposition_Type=None DT=None East_Asian_Width=Neutral Grapheme_Cluster_Break=Other GCB=XX
           Grapheme_Cluster_Break=XX Hangul_Syllable_Type=NA Hangul_Syllable_Type=Not_Applicable HST=NA Joining_Group=No_Joining_Group
           JG=NoJoiningGroup Joining_Type=Non_Joining JT=U Joining_Type=U Line_Break=AL Line_Break=Alphabetic LB=AL Numeric_Type=None NT=None
           Numeric_Value=NaN NV=NaN Present_In=1.1 IN=1.1 Present_In=2.0 IN=2.0 Present_In=2.1 IN=2.1 Present_In=3.0 IN=3.0 Present_In=3.1 IN=3.1
           Present_In=3.2 IN=3.2 Present_In=4.0 IN=4.0 Present_In=4.1 IN=4.1 Present_In=5.0 IN=5.0 Present_In=5.1 IN=5.1 Present_In=5.2 IN=5.2
           Present_In=6.0 IN=6.0 Script=Tamil SC=Taml Script=Taml Sentence_Break=LE Sentence_Break=OLetter SB=LE Word_Break=ALetter WB=LE
           Word_Break=LE
    

    If you look at other allocated blocks, you see the same sort of thing. It doesn’t make sense to slice up blocks into unrelated things.

    As I said, it’s not as though they’re going to run out of space, so I don’t know what the concern is here.

    BTW, you can get Unicode exploration and proceesing tools like unichars, uniprops, uninames from my Unicode Command-Line Toolchest, either individually from there or the entire suite available through the CPAN Unicode::Tussle suite.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

It has been mentioned in several sources that C++0x will include better language-level support
I want to insert a record into MySQL that has a non-ASCII Unicode character,
C++11 has two new character integral data types, char16_t and char32_t . I would
Unicode has a million icon-like glyphs, but they're not always easy to search by,
I would like to ask what does it mean AttributeError: 'unicode' object has no
I'm using Kohana 3, which has full support for Unicode. I have this as
Unicode has snowmen and chess pieces. Does it have the first (<< or |<),
The data stored in unicode (in database) has to be retrieved and convert into
I have two simple models in models.py: Service and Host. Host.services has a m2m
a) Do fonts know anything about coded character sets (Unicode, ASCII, etc.)? In other

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.