I’m trying to figure out what continuation bytes are (for curiousity sake) in the

Question

0

Editorial Team

Asked: May 30, 20262026-05-30T08:39:51+00:00 2026-05-30T08:39:51+00:00

I’m trying to figure out what continuation bytes are (for curiousity sake) in the

0

I’m trying to figure out what “continuation bytes” are (for curiousity sake) in the UTF-8 encoding.

Wikipedia introduces this term in the UTF-8 article without defining it at all

Google search returns no useful information either. I’m about to jump into the official specification, but would preferably read a high-level summary first.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T08:39:52+00:00

A continuation byte in UTF-8 is any byte where the top two bits are 10.

They are the subsequent bytes in multi-byte sequences. The following table may help:

Unicode code points  Encoding  Binary value
-------------------  --------  ------------
 U+000000-U+00007f   0xxxxxxx  0xxxxxxx

 U+000080-U+0007ff   110yyyxx  00000yyy xxxxxxxx
                     10xxxxxx

 U+000800-U+00ffff   1110yyyy  yyyyyyyy xxxxxxxx
                     10yyyyxx
                     10xxxxxx

 U+010000-U+10ffff   11110zzz  000zzzzz yyyyyyyy xxxxxxxx
                     10zzyyyy
                     10yyyyxx
                     10xxxxxx

Here you can see how the Unicode code points map to UTF-8 multi-byte byte sequences, and their equivalent binary values.

The basic rules are this:

If a byte starts with a 0 bit, it’s a single byte value less than 128.
If it starts with 11, it’s the first byte of a multi-byte sequence and the number of 1 bits at the start indicates how many bytes there are in total (110xxxxx has two bytes, 1110xxxx has three and 11110xxx has four).
If it starts with 10, it’s a continuation byte.

This distinction allows quite handy processing such as being able to back up from any byte in a sequence to find the first byte of that code point. Just search backwards until you find one not beginning with the 10 bits.

Similarly, it can also be used for a UTF-8 strlen by only counting non-10xxxxxx bytes.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to figure out what continuation bytes are (for curiousity sake) in the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply