Bear with me here.
A couple months ago I remember my algorithms teacher discussing the implementation of bucket sort with us (named Distribution sort in my algorithms book) and how it works. Basically, instead of taking a number at face value, we start comparing by the binary representation like so:
// 32 bit integers.
Input: 9 4
4: 00000000 00000000 00000000 00000110
9: 00000000 00000000 00000000 00001001
// Etc.
and start comparing from right to left.
// First step.
4: 0
9: 1
Output: 9 4
// Second step
4: 1
9: 0
Output: 4 9 // Technically a stable algorithm, but we cannot observe that here.
// Third step
4: 1
9: 0
Output: 4 9
// Fourth step
4: 0
9: 1
Output: 9 4
And that’s it; the other 28 iterations are all zeroes, so the output won’t change anymore. Now, comparing a whole bunch of strings like this would go
// strings
Input: "Christian" "Denis"
Christian: C h r i s t i a n
Denis: D e n i s
// First step.
Christian: n
Denis: s
Output: Christian, Denis
// Second step
Christian: a
Denis: i
Output: Denis, Christian
// ...
and so forth.
My question is, is comparing an signed char, a byte figure, faster than comparing ints?
If I had to assume, a 1 byte char is compared faster than a 4-byte integer. Is this correct? Can I make the same assumption with wchar_t, or UTF-16/32 formats?
In C or C++, a
charis simply a one-byte integer (though “one byte” may or may not be 8 bits). That means that in a typical case, the only difference you have to deal with is whether a single-byte comparison is faster than a multi-byte comparison.At least in most cases, the answer is no. Many RISC processors don’t have instructions to deal with single bytes at all, so an operation on a single byte is carried out by sign-extending the byte to a word, operating on the word, and then (if necessary) masking all the bits outside of the single byte back to zeros — i.e., operating on a whole word can often be around triple the speed of operating on a single byte.
Even on something like an x86 that supports single-byte operations directly, they’re still often slower (on a modern processor). There are a couple of things that contribute to this. First of all, the instructions using registers of the size “natural” to the current mode have a simpler encoding than instructions using other sizes. Second, a fair number of x86 processors have what’s called a “partial register stall” — even though it’s all implicit, internally they do something like the RISC does, carrying out an operation on a full-sized register, then merging it with the other bytes of the original value. For example, if you produce a result in AL then refer to EAX, the sequence will take longer to execute than if you produced the result in EAX to start with.
OTOH, if you look at old enough processors the reverse could be (and often was) true. For an extreme example, consider the Intel 8080 or Zilog Z80. Both had some 16-bit instructions, but the paths through the ALU were only 8 bits wide — a 16-bit addition, for example, was actually carried out as two consecutive 8-bit additions. If you could get by with only an 8-bit operation, it was about twice as fast. Although 8-bit processors are a (distant) memory on desktop machines, they’re still used in some embedded applications, so this isn’t entirely obsolete either.