I’m writing some Java code that deals with Chinese characters, and I got some

Question

0

Asked: May 31, 20262026-05-31T16:01:38+00:00 2026-05-31T16:01:38+00:00

I’m writing some Java code that deals with Chinese characters, and I got some

0

I’m writing some Java code that deals with Chinese characters, and I got some unexpected results — strings that should be equal were not. Here is one of the offending characters, which means “six” (pinyin: liù): 六. This character can be represented with either of two code points:

F9D1 in the block: CJK Compatibility Ideographs
516D in the block: CJK Unified Ideographs

Wikipedia has a page about these character ranges, and the short section on compatibility ideographs does mention some duplicates, but the list omits this specific character.

So I’m wondering:

Is there a list of duplicate unicode characters somewhere so I can transform Strings before trying to compare them?
Is this normal when dealing with CJK characters, or have I done something else wrong?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T16:01:39+00:00

Just normalize them. U+F9D1 becomes U+516D under any of the four normalization schemes:

$ export PERL_UNICODE=S

$ perl -le 'print "\x{F9D1}\x{516D}"' | uniquote -v
\N{CJK COMPATIBILITY IDEOGRAPH-F9D1}\N{CJK UNIFIED IDEOGRAPH-516D}

$ perl -le 'print "\x{F9D1}\x{516D}"' | nfd | uniquote -v
\N{CJK UNIFIED IDEOGRAPH-516D}\N{CJK UNIFIED IDEOGRAPH-516D}
$ perl -le 'print "\x{F9D1}\x{516D}"' | nfc | uniquote -v
\N{CJK UNIFIED IDEOGRAPH-516D}\N{CJK UNIFIED IDEOGRAPH-516D}
$ perl -le 'print "\x{F9D1}\x{516D}"' | nfkd | uniquote -v
\N{CJK UNIFIED IDEOGRAPH-516D}\N{CJK UNIFIED IDEOGRAPH-516D}
$ perl -le 'print "\x{F9D1}\x{516D}"' | nfkc | uniquote -v
\N{CJK UNIFIED IDEOGRAPH-516D}\N{CJK UNIFIED IDEOGRAPH-516D}

Many essential Unicode tools, including those, are available here.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m writing some Java code that deals with Chinese characters, and I got some

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply