I am currently designing a font engine for an embedded display. The basic problem is the following:
I need to take a dynamically generated text string, look up the values from that string in a UTF-8 table, then use the table to point to the compressed bitmap array of all the supported characters. After that is complete, I call a bitcopy routine that moves the data from the bitmap array to the display.
I will not be supporting the full UTF-8 character set, as I have very limited system resources to work with (32K ROM, 8K RAM), but want to have the ability to add the needed glyphs later on for localization purposes. All development is being done in C and assembly.
The glyph size is a maximum of 16 bits wide by 16 bits tall. We will probably need to have support for the whole of the Basic Multilingual Plane (3 bytes), as some of our larger customers are in Asia. However, we would not be including the whole table in any specific localization.
My question is this:
What is the best way to add this UTF-8 support and associated table?
The solution below assumes that the lower 16 bits of the Unicode space will be enough for you. If your bitmap table has, say U+0020 through U+007E at positions 0x00 to 0x5E and U+00A0 through U+00FF at positions 0x5F to 0xBE and U+1200 through U+1241 at 0xBF to 0xFF, you could do something like the code below (which isn’t tested, not even compile-tested).
bitmapmap contains a series of pairs of values. The first value in the first pair is the Unicode code point which the bitmap at index 0 represents. The assumption is that the bitmap table contains a series of directly adjacent Unicode code points. So the second value says how long this series is.
The first part of the while loop iterates through UTF-8 input and builds up a Unicode code point in ucs2char. Once a complete character is found, the second part searches for that character in one of the ranges mentioned in bitmapmap. If it finds an appropriate bitmap index, it adds it to indexes. Characters for which no bitmap is present are silently dropped.
The function returns the number of bitmap indexes found.
This way of doing things should be memory-efficient in terms of the unicode->bitmap table, reasonably fast and reasonably flexible.
EDIT: You mentioned that you need more than the lower 16 bits. s/unsigned short/unsigned int/;s/ucs2char/codepoint/; in the above code and it can then do the whole Unicode space.