Recently I’ve been dealing with texts with mixed languages, including Chinese, English, and even some emoticons.
I’ve been searching for this issue quite a lot, but the only thing I can find is “to replace full-width characters with half-width characters” rather than telling you how to determine whether the character is a half- or full-width word.
So, my question is:
Is it possible to tell whether a word is half-width or full-width?
In unicode 6.1, there is the block
Halfwidth and Fullwidth forms, pdf here.Within this block,
\uFF01-\uFF60and\uFFE0-\uFFE6are fullwidth, while\uFF61-\uFFDCand\uFFE8-\uFFEEare halfwidth.