I have a regex ([-@.\/,’:\w][\w]) and it matches all words within a text (including

Question

0

Asked: May 20, 20262026-05-20T22:31:03+00:00 2026-05-20T22:31:03+00:00

I have a regex ([-@.\/,’:\w][\w]) and it matches all words within a text (including

0

I have a regex ([-@.\/,':\w]*[\w])* and it matches all words within a text (including punctuated words like I.B.M), but I want to make it exclude underscores and I can’t seem to figure out how to do it… I tried adding ^[_] (e.g. (^[_][-@.\/,':\w]*[\w])*) but it just breaks up all the words into letters. I want to preserve the word matching, but I don’t want to have words with underscores in them, nor words that are entirely made up of underscores.

Whats the proper way to do this?

P.S.

My app is written in C# (if that makes any difference).
I can’t use A-Za-z0-9 because I have to match words regardless of the language (could be Chinese, Russian, Japanese, German, English).

Update
Here is an example:

“I.B.M should be parsed as one word w_o_r_d! Russian should work too: мплекс исторических событий.”

The matches should be:

I.B.M.  
should  
be  
parsed  
as  
one  
word  
Russian  
should  
work  
too  
мплекс  
исторических  
событий

Note that w_o_r_d should not get matched.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T22:31:03+00:00

Try this instead:

([-@.\/,':\p{L}\p{Nd}]*[\p{L}\p{Nd}])*

The \w class is composed of [\p{L}\p{Nd}\p{Pc}] when you’re performing Unicode matching. (Or simply [a-zA-Z0-9] if you’re doing non-Unicode matching.)

It’s the \p{Pc} Unicode category — punctuation/connector — that causes the problem by matching underscores, so we explicitly match against the other categories without including that one.

(Further information here, “Character Classes: Word Character”, and here, “Character Classes: Supported Unicode General Categories”.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a regex ([-@.\/,’:\w]*[\w])* and it matches all words within a text (including

Leave an answerCancel reply

1 Answer

I have a regex ([-@.\/,’:\w][\w]) and it matches all words within a text (including

Leave an answer
Cancel reply