I’d like to convert some character into five digit unicode on in Python 3.3.
For example,
import re
print(re.sub('a', u'\u1D15D', 'abc' ))
but the result is different from what I expected.
Do I have to put the character itself, not codepoint?
Is there a better way to handle five digit unicode characters?
Python unicode escapes either are 4 hex digits (
\uabcd) or 8 (\Uabcdabcd); for a codepoint beyond U+FFFF you need to use the latter (a capital U), make sure to left-fill with enough zeros:(And yes, the U+1D15D codepoint (MUSICAL SYMBOL WHOLE NOTE) is in the above example, but your browser font may not be able to render it, showing a place-holder glyph (a box or question mark) instead.
Because you used a
\uabcdescape, you replacedainabcwith two characters, the codepoint U+1D15 (ᴕ, latin letter small capital ou), and the ASCII characterD. Using a 32-bit unicode literal works:where again the U+1D15D codepoint could be displayed by your font as a placeholder glyph instead.