Maybe this is just my unfamiliarity with unicode, so please correct me if I’m mistaken.
Looking at http://json.org/, the spec says that a string can include “any UNICODE character”, but this confuses me.
- JSON is a communication format
correct? At the core of it,
everything must translate down to
bytes. - In contrast, UNICODE is a
logical format and must be encoded to
be able to transmit it, right?
So what did they mean there?
From the RFC:
3. Encoding JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets. 00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx 00 UTF-16LE xx xx xx xx UTF-8