I am trying to do this: val = re.sub(r’\b’ + u_word +’\b’, unicode(new_word), u_text)

Question

0

Editorial Team

Asked: June 15, 20262026-06-15T15:38:49+00:00 2026-06-15T15:38:49+00:00

I am trying to do this: val = re.sub(r’\b’ + u_word +’\b’, unicode(new_word), u_text)

0

I am trying to do this:

val = re.sub(r'\b' + u_word +'\b', unicode(new_word), u_text)

(All strings are non-latin.)

It does not work, at all!.

Is it possible to find-replace non-latin words (whole words) in a non-latin text with regex?
How?

EDIT:

If you want to test try these strings:

>>> u_word = u'αβ'
>>> u_text = u'αβγ αβ αβγδ δαβ'
>>> new_word = u'χχ'
>>> val = re.sub(r'\b' + u_word +r'\b', unicode(new_word), u_text)
>>> val
u'\u03b1\u03b2\u03b3 \u03b1\u03b2 \u03b1\u03b2\u03b3\u03b4 \u03b4\u03b1\u03b2'
>>> u_text
u'\u03b1\u03b2\u03b3 \u03b1\u03b2 \u03b1\u03b2\u03b3\u03b4 \u03b4\u03b1\u03b2'
>>>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T15:38:50+00:00

You need to pass the re.UNICODE flag to sub, like so:

val = re.sub(r'\b' + u_word + r'\b', unicode(new_word), u_text, flags=re.UNICODE)

\b is a word boundary. Without the re.UNICODE flag, a “word” contains only characters from the set [a-zA-Z0-9_], so αβ isn’t seen as a “word”. For more information see the re documentation (specifically \b, \w, and re.UNICODE).

FYI:

If new_word is already a unicode string (as in your example), unicode(new_word) is superfluous, it returns new_word unmodified.
In Python 3.x, unicode is no longer a special case. Your code would have worked as is in Python 3.x (minus unicode() which was removed because it’s no longer necessary).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to do this: val = re.sub(r’\b’ + u_word +’\b’, unicode(new_word), u_text)

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply