Does character set encoding affects the result of strstr() function?
For example, I have read a data to “buf” and do this:
char *p = strstr (buf, "UNB");
I wonder whether the data is encoded in ASCII or others (e.g. EBCDIC) affects the result of this function?
(Since “UNB” are different bit streams under different encoding ways…)
If yes, what’s the default that is used for these function? (ASCII?)
Thanks!
The C functions like
strstroperate on the rawchardata,independently of the encoding. In this case, you potentially have two
different encodings: the one the compiler used for the string literal,
and the one your program used when filling
buf. If these aren’t thesame, then the function may not work as expected.
With regards to the “default” encoding, there isn’t one, at least as far
as the standard is concerned; the ”basic execution character
set“ is implementation defined. In practice, systems which don’t
use an encoding derived from ASCII (ISO 8859-1 seems the most common, at
least here in Europe) are exceedingly rare. As for the encoding you get
in
buf, that depends on where the characters come from; if you’rereading from an
istream, it depends on the localeimbued in thestream. In practice, however, again, almost all of these (UTF-8,
ISO8859-x, etc.) are derived from ASCII, and are identical with ASCII
for all of the characters in the basic execution character set
(which includes all of the characters legal in traditional C). So for
"UNB", you’re likely safe. (but for something like"üéâ", you almostcertainly aren’t.)