I was (re)reading Joel’s great article on Unicode and came across this paragraph, which I didn’t quite understand:
For example, you could encode the Unicode string for Hello (U+0048
U+0065 U+006C U+006C U+006F) in ASCII, or the old OEM Greek Encoding,
or the Hebrew ANSI Encoding, or any of several hundred encodings that
have been invented so far, with one catch: some of the letters might
not show up! If there’s no equivalent for the Unicode code point
you’re trying to represent in the encoding you’re trying to represent
it in, you usually get a little question mark: ? or, if you’re really
good, a box. Which did you get? -> �
Why is there a question mark, and what does he mean by “or, if you’re really good, a box”? And what character is he trying to display?
There is a question mark because the encoding process recognizes that the encoding can’t support the character, and substitutes a question mark instead. By “if you’re really good,” he means, “if you have a newer browser and proper font support,” you’ll get a fancier substitution character, a box.
In Joel’s case, he isn’t trying to display a real character, he literally included the Unicode replacement character, U+FFFD REPLACEMENT CHARACTER.