I am using posix c regex library(regcomp/regexec) on my search application. My application supports

Question

0

Asked: May 11, 20262026-05-11T10:48:23+00:00 2026-05-11T10:48:23+00:00

I am using posix c regex library(regcomp/regexec) on my search application. My application supports

0

I am using posix c regex library(regcomp/regexec) on my search application. My application supports different languages including those that uses multi-byte characters. I’m encountering a problem when using word boundary metacharacter (\b). For single-byte strings, it works just fine, e.g:

‘\bpaper\b’ matches ‘paper’

However, if the regex and query strings are multi-byte, it doesn’t seem to work correctly, e.g:

‘\b紙張\b’ doesn’t match ‘紙張’

Am I missing something? Any help would be highly appreciated.

Requested Info:

Programming Language: C
Regex Library: GNU C (regex.h)

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T10:48:23+00:00

if the regex and query strings are multi-byte, it doesn’t seem to work correctly

What is “multi-byte” in this context? A string encoded into UTF-8 bytes? A locale-specific multibyte encoding such as GB?

If you’re not dealing with wide (Unicode) strings natively, you can’t expect any more support for non-ASCII characters than just detecting they’re there. POSIX regex doesn’t specify any character classes for bytes outside the ASCII range, so it doesn’t know that any of the bytes in ‘\xe7\xb4\x99’ (the UTF-8 representation of ‘紙’) could be considered word-letters; hence it sees no word boundaries.

What constitutes a letter or a word in Unicode is a more involved question than simple ASCII regex can cope with. (And obviously, what constitutes a ‘word’ in Chinese is arguable in itself.) If all you want to detect is plain old spaces, you could do that explicitly:

(\s|^)紙張(\s|$)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using posix c regex library(regcomp/regexec) on my search application. My application supports

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply