It seems to me if UTF-8 was the only encoding used everywhere ever, there would be a lot less issues with code:
- Don’t even need to think about encoding issues.
- No issues with mixed 1-2-byte character streaming, because everything uses 2 bytes.
- Browsers don’t need to wait for the
<meta>tag specifying encoding before they can do anything. StackOverflow doesn’t even have the meta tag, making browsers download the full page first, slowing page rendering. - You would never see
?and other random symbols on old web pages (e.g. in place of Microsoft Word’s special [read: horrible] quotes). - More characters can be represented in UTF-8.
- Other things I can’t think of right now.
So why haven’t the inferior encodings been nuked from space?
True. Except for all the data that’s still in the old ASCII format.
Incorrect. UTF-8 is variable length, from 1 to 6 or so bytes.
Browsers don’t generally wait for the full page, they make a guess based on the first part of the page data.
Except for all those other old web pages that use other non-UTF-8 encodings (the non-English speaking world is pretty big).
True. Your problems of data validation just got harder, too.