See WCF RESTful POX, JSON and SOAP Coexist. It's actually…

Question

0

Asked: May 11, 20262026-05-11T10:01:47+00:00 2026-05-11T10:01:47+00:00

I am currently designing a font engine for an embedded display. The basic problem

0

I am currently designing a font engine for an embedded display. The basic problem is the following:

I need to take a dynamically generated text string, look up the values from that string in a UTF-8 table, then use the table to point to the compressed bitmap array of all the supported characters. After that is complete, I call a bitcopy routine that moves the data from the bitmap array to the display.

I will not be supporting the full UTF-8 character set, as I have very limited system resources to work with (32K ROM, 8K RAM), but want to have the ability to add the needed glyphs later on for localization purposes. All development is being done in C and assembly.

The glyph size is a maximum of 16 bits wide by 16 bits tall. We will probably need to have support for the whole of the Basic Multilingual Plane (3 bytes), as some of our larger customers are in Asia. However, we would not be including the whole table in any specific localization.

My question is this:
What is the best way to add this UTF-8 support and associated table?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T10:01:47+00:00

The solution below assumes that the lower 16 bits of the Unicode space will be enough for you. If your bitmap table has, say U+0020 through U+007E at positions 0x00 to 0x5E and U+00A0 through U+00FF at positions 0x5F to 0xBE and U+1200 through U+1241 at 0xBF to 0xFF, you could do something like the code below (which isn’t tested, not even compile-tested).

bitmapmap contains a series of pairs of values. The first value in the first pair is the Unicode code point which the bitmap at index 0 represents. The assumption is that the bitmap table contains a series of directly adjacent Unicode code points. So the second value says how long this series is.

The first part of the while loop iterates through UTF-8 input and builds up a Unicode code point in ucs2char. Once a complete character is found, the second part searches for that character in one of the ranges mentioned in bitmapmap. If it finds an appropriate bitmap index, it adds it to indexes. Characters for which no bitmap is present are silently dropped.

The function returns the number of bitmap indexes found.

This way of doing things should be memory-efficient in terms of the unicode->bitmap table, reasonably fast and reasonably flexible.

// Code below assumes C99, but is about three cut-and-pastes from C89 // Assuming an unsigned short is 16-bit  unsigned short bitmapmap[]={0x0020, 0x005E,                             0x00A0, 0x0060,                             0x1200, 0x0041,                             0x0000};  int utf8_to_bitmap_indexes(unsigned char *utf8, unsigned short *indexes) {     int bitmapsfound=0;     int utf8numchars;     unsigned char c;     unsigned short ucs2char;     while (*utf8)     {         c=*utf8;         if (c>=0xc0)         {             utf8numchars=0;             while (c&0x80)             {                 utf8numchars++;                 c<<=1;             }             c>>=utf8numchars;             ucs2char=0;         }         else if (utf8numchars && c<0x80)         {             // This is invalid UTF-8.  Do our best.             utf8numchars=0;         }          if (utf8numchars)         {             c&=0x3f;             ucs2char<<=6;             ucs2char+=c;             utf8numchars--;             if (utf8numchars)                 continue; // Our work here is done - no char yet         }         else             ucs2char=c;          // At this point, we have a complete UCS-2 char in ucs2char          unsigned short bmpsearch=0;         unsigned short bmpix=0;         while (bitmapmap[bmpsearch])         {             if (ucs2char>=bitmapmap[bmpsearch] && ucs2char<=bitmapmap[bmpsearch]+bitmapmap[bmpsearch+1])             {                 *indexes++ = bmpix+(ucs2char-bitmapmap[bmpsearch]);                 bitmapsfound++;                 break;             }              bmpix+=bitmapmap[bmpsearch+1];             bmpsearch+=2;         }     }     return bitmapsfound; }

EDIT: You mentioned that you need more than the lower 16 bits. s/unsigned short/unsigned int/;s/ucs2char/codepoint/; in the above code and it can then do the whole Unicode space.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions