I have some text in a string, I need to check whether that particular string contains any characters which are not allowed to make a word.
Suppose I have text like “(hello}”
Here it contains to symbols ‘(‘ and ‘}’. How could I do it in C++. And a string may contains any unicode character.
If the string really contains Unicode (UTF-8), the problem is decidedly
non-trivial; you’ll probably want to use some external library, like
ICU. Or you can convert to
wchar_t(wstring), and use the singlebyte encoding solution below:
If the characters are single byte encoded,
std::find_ifwith asuitable predicate should do the trick. If you’re doing any text
parsing, you’ll want to define as set of such predicates, once and for
all; the predicates can use the functions in the
std::ctypefacet oflocale, or the ones inwctype.h(which use the global locale).Still, if you are dealing with Unicode, even converting to wide
characters may not be enough, since full Unicode can still use more than
one code point to represent a single character. The real question is
just how serious you want to do this. (Note too that in many languages,
like English or French, “words” can contain characters which Unicode
considers punctuation, e.g. “don’t” or “aujourd’hui”—the Unicode
tables will tell you that
'\''is punctuation, not part of a word.)