Hi i am using Standard Regex Library (regcomp, regexec..). But now on demand i should add unicode support to my codes for regular expressions.
Does Standard Regex Library provide unicode or basically non-ascii characters? I researched on the Web, and think not.
My project is resource critic therefore i don’t want to use large libraries for it (ICU and Boost.Regex).
Any help would be appreciated..
Looks like POSIX Regex working properly with UTF-8 locale. I’ve just wrote a simple test (see below) and used it for matching string with a cyrillic characters against regex
"[[:alpha:]]"(for example). And everything working just fine.Note: The main thing you must remember – regex functions are locale-related. So you must call
setlocale()before it.Usage example:
The length of the matching result is two bytes because cyrillic letters in UTF-8 takes so much.