Regex \w seems to ignore my Unicode strings.
I created the following function:
extras.py
# -*- coding: utf-8 -*-
def test(word):
print re.sub(r'[^\w]+', '', word, re.U)
and from the django shell:
import extras
extras.test(u'שלום')
The output is an empty string, while it should be the same as the input, in this example.
The purpose of the regex is to keep only alphanumeric characters, but it doesn’t work. It works with ASCII though.
What can be the problem?
Use a raw unicode string for the pattern, and make sure to use the
flagsparameter:then: