I am trying to segment a Korean string into individual syllable. So the input

Question

0

Asked: June 18, 20262026-06-18T01:47:53+00:00 2026-06-18T01:47:53+00:00

I am trying to segment a Korean string into individual syllable. So the input

0

I am trying to segment a Korean string into individual syllable.
So the input would be a string like “서울특별시” and the outcome “서”,”울”,”특”,”별”,”시”.
I have tried with both C++ and Python to segment a string but the result is a series of ? or white spaces respectively (The string itself however can be printed correctly on the screen).
In c++ I have first initialized the input string as string korean="서울특별시" and then used a string::iterator to go through the string and print each individual component.
In Python I have just used a simple for loop.

I have wondering if there is a solution to this problem. Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T01:47:54+00:00

I don’t know Korean at all, and can’t comment on the division into syllables, but in Python 2 the following works:

# -*- coding: utf-8 -*- 
print(repr(u"서울특별시"))
print(repr(u"서울특별시"[0]))

Output:

u'\uc11c\uc6b8\ud2b9\ubcc4\uc2dc'
u'\uc11c'

In Python 3 you don’t need the u for Unicode strings.

The outputs are the unicode values of the characters in the string, which means that the string has been correctly cut up in this case. The reason I printed them with repr is that the font in the terminal I used, can’t represent them and so without repr I just see square boxes. But that’s purely a rendering issue, repr demonstrates that the data is correct.

So, if you know logically how to identify the syllables then you can use repr to see what your code has actually done. Unicode NFC sounds like a good candidate for actually identifying them (thanks to R. Martinho Fernandes), and unicodedata.normalize() is the way to get that.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to segment a Korean string into individual syllable. So the input

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply