Does anyone know of a great small open source Unicode handling library for C or C++? I’ve looked at ICU, but it seems way too big.
I need the library to support:
- all the normal encodings
- normalization
- finding character types – finding if a character should be allowed in identifiers and comments
- validation – recognizing nonsense
I looked at UT8-CPP, and libiconv, and neither seemed to have all the features I needed. So, I guess I’ll just use ICU, even though it is really big. I think there are some ways to strip out the unneeded functions and data, so I’ll try that. This page (under "Customizing ICU’s Data Library") describes how to cut out some of the data.