Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8081349
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T16:42:14+00:00 2026-06-05T16:42:14+00:00

This question is probably borderline for stack overflow, so I apologize in advance if

  • 0

This question is probably borderline for stack overflow, so I apologize in advance if it seems overly off-topic. I’m writing a program that involves many languages and I’m in need of a table which maps languages to Unicode points. Those of you familiar with Unicode will know that characters are divided up in ‘blocks’ such as Latin, Cyrillic, etc. Of course, most languages which use Latin characters do not use all the Latin characters, and most languages which use Cyrillic characters do not use all the Cyrillic characters, etc. I’m interested in a table that maps English only to those characters used in English, Spanish to only those characters used in Spanish, etc. There’s no need to cover every language in the world (as this would be nearly impossible) but at least some of the more common languages. (Even then, this would be a fairly extensive table involving many-to-many relationships.) I’m not sure that such a table exists. (If it doesn’t, I may turn this into an open-source project, as it would be very useful for me and possibly for others.)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T16:42:16+00:00Added an answer on June 5, 2026 at 4:42 pm

    CLDR, the Unicode Common Locale Data Repository, contains definitions for character collections for a large number of languages. The exemplarCharacters element specifies the characters needed for normal writing of words of the language. Current definitions for this element can be seen on the By-Type Chart: misc.exemplarCharacters page (grouped by writing system), but for automated processing, you may find the XML files more suitable. The exemplarCharacters-other element currently contains similar data for punctuation characters.

    That’s probably the best available compilation of such information in general, but it is conceptually very vague (it does not really try to define what it means to be a character used to write a language), and the information for different languages has been collected in a process that is open but does not contain general quality control.

    The meanings of the elements are defined in the LDML specification, clause 5.6 Character Elements. Note the description “The <characters> element provides optional information about characters that are in common use in the locale, and information that can be helpful in picking resources or data appropriate for the locale, such as when choosing among character encodings that are typically used to transmit data in the language of the locale.” This is a rather strange viewpoint, especially in a Unicode Consortium document, since we can use UTF-8, which covers all languages. But there are other issues where the information about characters used in a language could be useful, like the selection of a font for text, or preliminary checking of input data, or setting parameters for OCR scanning, or defining keyboard setups. These contexts may well require different definitions for the concept “characters used in a language”.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

(This question should probably be answered with a reference to Stroustrup.) It seems extremely
I half realize that this question is probably asked already, but I'm not familiar
First, I'd like to point out that this question is probably already asked, I
this question probably wont be explained very well and that's because I don't really
This question is probably for MySQL experts and admins that have done this sort
I realise that this question has probably been asked to death, but none of
I'm sure this question has probably been answered before, so I apologize, but I
This question probably is based on my lack of understanding of the role of
Ok, so this question probably isn't Silverlight specific. I have a silverlight 2 page
I know this question is probably stoopid. But I just don't want to cause

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.