A common interview question asks to write an algorithm that detects duplicates in a string.
Using a character array of length 128 to keep track of the characters already seen is a good way to solve this problem in linear time.
In C we would type something like
char seen_chars[128];
unsigned char c;
/* set seen_chars to all zeros, assign c */
seen_chars[ c ] = 1;
To mark character c as seen. Of course this relies on
(int) c
returning a value between 0 and 127.
I’m wondering when would this fail? What are the assumptions that make this code work correctly?
The code will fail (and cause undefined behavior) every time when the integer value of the given
char cis not between 0 and 127 (inclusive).C does in no way limit the maximum range of
char– you are only guaranteed that it can hold at least 256 distinct values – so in any given C implementation a valid char value can be out of that boundary. On most desktop systems acharcan hold values from -128 to 127, or from 0 to 255. However, as an example:The following would be valid (although it may exhaust your stack on systems with large
chars):