What is difference between UTF-8 and HTML entities?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
See UTF-8 more as a means to losslessly and self-synchronising map a list of natural numbers to a bytestream so that you can get the natural numbers back (lossless) and if you just fall ‘in the middle’ of the stream that’s not a big problem. (self-synchronizing)
Each natural number just happens to represent a ‘character’.
HTML entities is a way to represent those same natural numbers in a way like:
, stands for the natural number 127, in unicode that being theDELcharacter.In UTF-8 that’s the bytestream:
0111 1111Once you go above 127 it becomes more than one octet, therefore, 128 becomes:
1000 0001 1111 1111.Two
DELchars in a row become0111 1111 0111 1111. UTF-8 is designed in such a way, that it’s always possible to retrieve the original list of ‘unicode scalar values’ from the bytestream, even though a bytestream of for instance 4 octets can map back to between 1 and 4 different of such scalar values. UTF-8 is thus ‘variable length’ as they call it.