I need to unpack binary data that is encoded rather exotically: a 32 bit 2’s complement bit pattern, representing a SHORT.USHORT decimal fraction, with a signed SHORT integer component and an unsigned SHORT “this many 1/65536 parts” decimal fraction component. To make things even more fun, the sign of the SHORT is determined by the first bit in the 2’s complement 32 bit pattern. Not by its sign after decoding to ‘real’ bit pattern.
An example of this would be the following:
2's complement bit pattern: 11111111110101101010101010101100
converted 'normal' pattern: 00000000001010010101010101010100
SHORT bits (upper 16): 0000000000101001 (decimal: 41)
USHORT bits (lower 16: 0101010101010100 (decimal: 21844)
actual number encoded: -41.333 (41, negative from high MSB + 21844/65536)
(if you think this scheme is insane: it certainly seems that way, doesn’t it? It’s the byte format used in Type2 fonts that are encoded in a CFF block, or “compact font format” block. Crazy as it is, this format is set in stone, and we’re about 20 years too late to have it changed. This is the byte layout in a CFF font, and the only thing we get to worry about now is how to correctly decode it)
Problems occur when we’re dealing with patterns like these:
2's complement bit pattern: 00000000000000000000000000000001
converted pattern: 11111111111111111111111111111111
upper 16 bits: 1111111111111111 (decimal 65535 *OR* -1)
lower 16 bits: 1111111111111111 (decimal 65535)
SHORT.USHORT number: -65536 *OR* 1
Depending on who you ask, the pattern 1111111111111111 can be decoded either as 65535, such as when interpreted as a bit pattern in a larger (32 or 64 bit) number, or as -1, when interpreted as a 16 bit signed integer. The only correct interpretation here, however, is as the latter, so this leads us to the question’s subject line:
what PHP code do I use to turn this 16 bit pattern into the correct number, given that PHP has no pack/unpack parameter for unpacking as 16 bit int with the most significant bit first? There is a parameter for unpacking a 16 bit int using machine-indicated byte order, but this is going to give problems because font data storage is non-negotiable: all fonts, allwhere, everywhen, must be encoded using Motorola/Big Endian byte ordering, irrespective of the machine’s preferred byte ordering.
My code to going from 32-bit 2’s complement to final value at the moment is this:
// read in 32 bit pattern, represenging a 2's complement pattern
$p2c = 0x01000000 * $b[x] + 0x010000 * $b[x+1] + 0x0100 * $b[x+2] + $b[x+3];
// convert 2's complement to plain form
$p = (~$p2c + 1) & 0xFFFFFFFF;
// get lower 16 bits, representing an unsigned short.
// due to unsigned-ness, this values is always correct.
$ushort = 0xFFFF & $p;
// get higher 16 bits, representing a signed short.
// due to its sign, this value can be spectacularly wrong!
$short = ($p >> 16);
// "reconstitute" the FIXED format number
$num = - ($short + round($ushort/65536,3));
This had a pretty simple answer that I completely ignored for no good reason, and of course didn’t think of until I wrote this question.
and voila.