I have written a dead-simple serialization format which encodes unsigned integers by first converting them into bytes in big-endian form and then prefixing them with a single byte specifying the number of bytes which the number takes up. Eg. 3 = 01 03, 268 = 02 01 0C. The range of integers is therefore 0 to 2^255 - 1.
I use this to serialize strings by prefixing the string with the encoding of its length, and I can then serialize arbitrary structures quite easily, for example a list of strings is an encoding of the number of elements followed by the encoding of each of the strings.
Here it is in PHP: https://gist.github.com/4577886.
My question is: What do you call this method of serialization? Is it very often used? Is there anything wrong with it?
Thanks.
It’s kind of type-length-value, without the type. And that’s what’s wrong with it. How do you know whether the next 4 bytes are an int or a string or a 4-byte array?