I am working with a Chinese database in text that saves entries in this

Question

0

Editorial Team

Asked: May 28, 20262026-05-28T08:13:02+00:00 2026-05-28T08:13:02+00:00

I am working with a Chinese database in text that saves entries in this

0

I am working with a Chinese database in text that saves entries in this format:

Traditional Simplified [pin1 yin1] /English equivalent 1/equivalent 2/

I’ve tried parsing it using delimiters (in Java).

This is what I have so far:

                    String delims = "[\\[\\]/]+";
                    String tokens[] = str.split(delims);

The problem is that the English equivalent also contains delimiter tokens.

For instance:

⿔⿔ [gui1] /variant of 龜|龟[gui1]/

How would someone parse this String?

I’m trying to get the following information from the String:

Simplified: ⿔

Traditional: ⿔

Pinyin: gui1

English Equivalent: variant of 龜|龟[gui1]

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T08:13:02+00:00

Try using regex to cleanup the whole string.

String text = "⿔ ⿔ [gui1] /variant of 龜|龟[gui1]/";

String pattern =    "(\\S+)\\s*(\\S+)\\s*\\[(.+?)\\]\\s*/(.+?)/";

text = text.replaceAll(pattern, "$1;$2;$3;$4"));

(\\S+) —> ⿔
find continuous non-white space group

\\s* —>
find continuous white space

\\[(.+?)\\] —> gui1
find everything inside [ bla bla bla ].
‘?’ will match shortest possible answer.
e.g. [ bla bla ] rather than [ bla bla] [ble ble ]

/(.+?)/ —> variant of 龜|龟[gui1]
same as above, but find everything inside / bla bla /
‘?’ will match shortest

You can test the regex here

Now text becomes:
⿔;⿔;gui1;variant of 龜|龟[gui1]

Next you can continue to use ; as delims to split them

String tokens[] = text.split(";");

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am working with a Chinese database in text that saves entries in this

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply