Why Unicode has several reserved character codes?
See the Unicode for two languages- Kannada and Tamil.
Both language are very old and I think there is no chance to get new characters to these languages.
EDIT: Then why are they wasting some character codes by making it reserved character codes?
Why are they not placing the reserved character codes at the end of each language character set?
Why Unicode has several reserved character codes? See the Unicode for two languages- Kannada
Share
This has to do with how the Unicode consortium doles out its allocated blocks, scripts, and code points. For example, in
Block=Tamil, the start of it runs this way:They tend to reserve contiguous rows of 4, 8, or 16 code points to all the same “kind” of character. Yes, there are gaps there, but it’s like how in the filesystem, once you allocate a sector (or block if you don’t have separate sectors within a block) to one file, even if that file doesn’t use everything in its (final) sector, you don’t go giving away those unused byte to some other process. Things tend to get padded to block boundaries anyway.
It’s not like we’re at any risk of running out of codes.
Here is the beginning of the allocated area starts with “Signs”, as shown by the first assigned code points in that block. The gap may represent a change from one kind of character to another. If you check out the first five code points in the block for their properties, you see that those unassigned code points still have the right block property:
If you look at other allocated blocks, you see the same sort of thing. It doesn’t make sense to slice up blocks into unrelated things.
As I said, it’s not as though they’re going to run out of space, so I don’t know what the concern is here.
BTW, you can get Unicode exploration and proceesing tools like unichars, uniprops, uninames from my Unicode Command-Line Toolchest, either individually from there or the entire suite available through the CPAN
Unicode::Tusslesuite.