I’m looking to standardize some unicode text in python. I’m wondering if there’s an

Question

0

Asked: May 15, 20262026-05-15T12:34:54+00:00 2026-05-15T12:34:54+00:00

I’m looking to standardize some unicode text in python. I’m wondering if there’s an

0

I’m looking to standardize some unicode text in python. I’m wondering if there’s an easy way to get the "denormalized" form of a combining unicode character in python? e.g. if I have the sequence u'o\xaf' (i.e. latin small letter o followed by combining macron), to get ō (latin small letter o with macron). It’s easy to go the other way:

o = unicodedata.lookup("LATIN SMALL LETTER O WITH MACRON")
o = unicodedata.normalize('NFD', o)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T12:34:54+00:00

As I have commented, U+00AF is not a combining macron. But you can convert it into U+0020 U+0304 with an NFKD transform.

>>> unicodedata.normalize('NFKD', u'o\u00af')
u'o \u0304'

Then you could remove the space and get ō with NFC.

(Note that NFKD is quite aggressive on decomposition in a way that some semantics can be lost — anything that is “compatible” will be separated out. e.g.

'½' (U+008D) ↦ '1' '⁄' (U+2044) '2';
'²' (U+00B2) ↦ '2'
'①' (U+2460) ↦ '1'

etc.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m looking to standardize some unicode text in python. I’m wondering if there’s an

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply