Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4008636
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T08:46:47+00:00 2026-05-20T08:46:47+00:00

I was looking in mscorelib.dll with .NET Reflector, and stumbled upon the Char class.

  • 0

I was looking in mscorelib.dll with .NET Reflector, and stumbled upon the Char class. I always wondered how methods like Char.isLetter was done. I expected a huge list of test, but, buy digging a little bit, I found a really short code that determine the Unicode category. However, this code uses some kind of tables and some bitshifting voodoo. Can anyone explain to me how this is done, or point me to some resources?

EDIT :
Here’s the code. It’s in System.Globalization.CharUnicodeInfo.

internal static unsafe byte InternalGetCategoryValue(int ch, int offset)
{
    ushort num = s_pCategoryLevel1Index[ch >> 8];
    num = s_pCategoryLevel1Index[num + ((ch >> 4) & 15)];
    byte* numPtr = (byte*) (s_pCategoryLevel1Index + num);
    byte num2 = numPtr[ch & 15];
    return s_pCategoriesValue[(num2 * 2) + offset];
}

s_pCategoryLevel1Index is a short* and s_pCategoryValues is a byte*

Both are created in the CharUnicodeInfo static constructor :

 static unsafe CharUnicodeInfo()
{
    s_pDataTable = GlobalizationAssembly.GetGlobalizationResourceBytePtr(typeof(CharUnicodeInfo).Assembly, "charinfo.nlp");
    UnicodeDataHeader* headerPtr = (UnicodeDataHeader*) s_pDataTable;
    s_pCategoryLevel1Index = (ushort*) (s_pDataTable + headerPtr->OffsetToCategoriesIndex);
    s_pCategoriesValue = s_pDataTable + ((byte*) headerPtr->OffsetToCategoriesValue);
    s_pNumericLevel1Index = (ushort*) (s_pDataTable + headerPtr->OffsetToNumbericIndex);
    s_pNumericValues = s_pDataTable + ((byte*) headerPtr->OffsetToNumbericValue);
    s_pDigitValues = (DigitValues*) (s_pDataTable + headerPtr->OffsetToDigitValue);
    nativeInitTable(s_pDataTable);
}

Here is the UnicodeDataHeader.

internal struct UnicodeDataHeader
{
    // Fields
    [FieldOffset(40)]
    internal uint OffsetToCategoriesIndex;
    [FieldOffset(0x2c)]
    internal uint OffsetToCategoriesValue;
    [FieldOffset(0x34)]
    internal uint OffsetToDigitValue;
    [FieldOffset(0x30)]
    internal uint OffsetToNumbericIndex;
    [FieldOffset(0x38)]
    internal uint OffsetToNumbericValue;
    [FieldOffset(0)]
    internal char TableName;
    [FieldOffset(0x20)]
    internal ushort version;
}

Note : I Hope this doesn’t break any licence. If so, I’ll remove the code.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T08:46:47+00:00Added an answer on May 20, 2026 at 8:46 am

    The basic information is stored in charinfo.nlp which is embedded in mscorlib.dll as a resource and loaded at runtime. The specifics of the file are probably only known to Microsoft but suffice it to say that it probably is a lookup table in a fashion.

    EDIT

    According to MSDN:

    This enumeration is based on The Unicode Standard, version 5.0. For more information, see the “UCD File Format” and “General Category Values” subtopics at the Unicode Character Database.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Looking for C# class which wraps calls to do the following: read and write
Looking to do a very small, quick 'n dirty side project. I like the
I'm using an XmlSerializer to deserialize a particular type in mscorelib.dll XmlSerializer ser =
I would like to open a PE file (which i know is a .Net
Looking for a open source mature ASP.NET based blog engine which supports themes. I
Looking at posts like this and others, it seems that the correct way to
Looking for feedback on : http://code.google.com/p/google-perftools/wiki/GooglePerformanceTools
Looking for an example that: Launches an EXE Waits for the EXE to finish.
Looking at what's running and nothing jumps out. Thanks!
Looking for a Linux application (or Firefox extension) that will allow me to scrape

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.