I’m looking for a way to match only fully composed characters in a Unicode string.
Is [:print:] dependent upon locale in any regular expression implementation that incorporates this character class? For example, will it match Japanese character ‘あ’, since it is not a control character, or is [:print:] always going to be ASCII codes 0x20 to 0x7E?
Is there any character class, including Perl REs, that can be used to match anything other than a control character? If [:print:] includes only characters in ASCII range I would assume [:cntrl:] does too.
This mostly works, though it generates a warning about a wide character. But it gives you the idea: you must be sure you’re dealing with a real unicode string (check utf8::is_utf8). Or just check perlunicode at all – the whole subject still makes my head spin.