I was reading a question regarding the fastest way to reverse an array (which ended up being less than thrilling), and I came across an interesting comment located at the link here:
https://stackoverflow.com/a/1129028/857994
The solution referenced shows these two possibilities:
//Possibility #1
void reverse(char word[])
{
int len=strlen(word);
char temp;
for (int i=0;i<len/2;i++)
{
temp=word[i];
word[i]=word[len-i-1];
word[len-i-1]=temp;
}
}
//Possibility #2
void reverse(char word[])
{
int len=strlen(word);
for (int i=0;i<len/2;i++)
{
word[i]^=word[len-i-1];
word[len-i-1]^=word[i];
word[i]^=word[len-i-1];
}
}
and the comment states: “Using XOR will be far slower than swapping using a temp object.”
Nobody disputed this. So, my questions are:
- Is this true?
- Why is it true?
- Would it still be true if this was an array of a non-built-in-type?
The xor loop contains 2 memory reads and 1 memory write per line, for a total of 6 reads and 3 writes for each loop iteration. Furthermore, there is a strong dependency between the first line the writes to word[i] and the next line that reads from word[i]. This will prevent pipelining, or if the two lines execute in parallel, the second line’s read from word[i] will stall until the first line’s write is complete. There is another such dependency between the 2nd and 3rd lines.
In the temp var loop, the temp var will almost certainly be stored in a CPU register, not in main memory. So the total memory I/O count for the temp var loop is 2 reads and 2 writes. There are loose data flow dependencies between the statements, but they are read-before-write which can be pipelined. The data flow dependencies in the xor example are read-after-write, which are much harder to do without stalling the pipeline.
6 reads + 3 writes compared to 2 reads + 2 writes. 2 + 2 has a distinct advantage.