How can I convert decomposed unicode character sequences like LATIN SMALL LETTER E +

Question

0

Asked: May 17, 20262026-05-17T02:36:20+00:00 2026-05-17T02:36:20+00:00

How can I convert decomposed unicode character sequences like LATIN SMALL LETTER E +

0

How can I convert decomposed unicode character sequences like “LATIN SMALL LETTER E” + “COMBINING ACUTE ACCENT” (or U+0075 + U+0301) so they become the precomposed form: “LATIN SMALL LETTER E WITH ACUTE” (or U+00E9) using native Python 2.5+ functions?

If it matters, I am on Mac OS X (10.6.4) and I have seen the question Converting to Precomposed Unicode String using Python-AppKit-ObjectiveC but unfortunately while the described OS X native CoreFoundation function CFStringNormalize does not fail or halt the script execution it just doesn’t do anything.
And by that I don’t mean that it doesn’t return anything (its return type is void – it mutates in place). I have also tried all possible values for the constant parameter that specifies precomposing or decomposing in either canonical or non-canonical forms.

That is why I am searching for a Python native method of handling this case.

Thank you very much for reading!

André

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T02:36:21+00:00

import unicodedata as ud

astr=u"\N{LATIN SMALL LETTER E}" + u"\N{COMBINING ACUTE ACCENT}"
combined_astr=ud.normalize('NFC',astr)

‘NFC’ tells ud.normalize to apply the canonical decomposition (‘NFD’), then
compose pre-combined characters:

print(ud.name(combined_astr))
# LATIN SMALL LETTER E WITH ACUTE

They both print the same:

print(astr)
# é
print(combined_astr)
# é

But their reprs are different:

print(repr(astr))
# u'e\u0301'
print(repr(combined_astr))
# u'\xe9'

And their encodings, in say utf_8, are (not surprisingly) different too:

print(repr(astr.encode('utf_8')))
# 'e\xcc\x81'
print(repr(combined_astr.encode('utf_8')))
# '\xc3\xa9'

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

How can I convert decomposed unicode character sequences like LATIN SMALL LETTER E +

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply