I have performed full text indexing on a column in my mysql database. I want to use the regex filtering capability. I assumed these two would be equivalent.
>>> sum([bool(re.findall(r'\w+',p.abstract)) for p in Publication.objects.all()])
8467
>>> Publication.objects.filter(abstract__regex=r"\w+").count()
7974
If I go to even more complicated regular expressions I get much more varied results. For example \W{2} returns 13 and 8039 respectively. What am I missing here? Clearly my interpretation of __regex is incorrect.
EDIT:
http://dev.mysql.com/doc/refman/5.1/en/regexp.html
To expand on Gareth’s answer. Mysql has it’s own set of escaping rules. For example [[:alpha:]] is mysql speak for \w
The Django __regex filter uses the regex facility of the underlying database, which in your case is MySQL. It would appear that MySQL’s interpretation of the regular expression you list is not the same as Python’s. (I think — but I’m basing this on a brief web search rather than anything more principled, so don’t trust it — MySQL may treat
\was simply meaningw.)