I am having difficulty trying to figure out a bug in my Python (2.7)

Question

0

Asked: May 23, 20262026-05-23T01:07:07+00:00 2026-05-23T01:07:07+00:00

I am having difficulty trying to figure out a bug in my Python (2.7)

0

I am having difficulty trying to figure out a bug in my Python (2.7) script. I am getting an difference with using sub and findall in recognizing special characters.

Here is the code:

>>> re.sub(ur"[^-' ().,\w]+", '' , u'Castañeda', re.UNICODE)
u'Castaeda'
>>> re.findall(ur"[^-' ().,\w]+", u'Castañeda', re.UNICODE)
[]

When I use findall, it correctly sees ñ as an alphabetic character, but when I use sub it replaces this–treating it as a non-alphabetic character.

I’ve been able to get the correct functionality using findall with string.replace, but this seems like a bad solution. Also, I want to use re.split, and I’m having the same problems as with re.sub.

Thanks in advance for the help.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T01:07:08+00:00

The call signature of re.sub is:

re.sub(pattern, repl, string, count=0)

So

re.sub(ur"[^-' ().,\w]+", '' , u'Castañeda', re.UNICODE)

is setting count to re.UNICODE, which has value 32.

Try instead:

In [57]: re.sub(ur"(?u)[^-' ().,\w]+", '', u'Castañeda')
Out[57]: u'Casta\xf1eda'

Placing (?u) at the beginning of the regex is an alternate way to specify the re.UNICODE flag in the regex itself. You can also set the other flags (?iLmsux) this way. (For more info click this link and search for “(?iLmsux)”.)

Similarly, the call signature of re.split is:

re.split(pattern, string, maxsplit=0)

The solution is the same.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am having difficulty trying to figure out a bug in my Python (2.7)

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply