How can I match an alpha character with a regular expression. I want a character that is in \w but is not in \d. I want it unicode compatible that’s why I cannot use [a-zA-Z].
How can I match an alpha character with a regular expression. I want a
Share
Your first two sentences contradict each other. “in
\wbut is not in\d” includes underscore. I’m assuming from your third sentence that you don’t want underscore.Using a Venn diagram on the back of an envelope helps. Let’s look at what we DON’T want:
(1) characters that are not matched by
\w(i.e. don’t want anything that’s not alpha, digits, or underscore) =>\W(2) digits =>
\d(3) underscore =>
_So what we don’t want is anything in the character class
[\W\d_]and consequently what we do want is anything in the character class[^\W\d_]Here’s a simple example (Python 2.6).
Further exploration reveals a few quirks of this approach:
U+3021 (HANGZHOU NUMERAL ONE) is treated as numeric (hence it matches \w) but it appears that Python interprets “digit” to mean “decimal digit” (category Nd) so it doesn’t match \d
U+2438 (CIRCLED LATIN SMALL LETTER Y) doesn’t match \w
All CJK ideographs are classed as “letters” and thus match \w
Whether any of the above 3 points are a concern or not, that approach is the best you will get out of the re module as currently released. Syntax like \p{letter} is in the future.