I tried to find out the speed difference between plain loops, loop loops and builtin rep loops. I wrote three programs to compare the behavior:
Program 1
_start: xor %ecx,%ecx
0: not %ecx
dec %ecx
jnz 0b
mov $1,%eax
xor %ebx,%ebx
int $0x80 # syscall 1: exit
Program 2
_start: xor %ecx,%ecx
not %ecx
loop .
mov $1,%eax
xor %ebx,%ebx
int $0x80
Program 3
_start: xor %ecx,%ecx
not %ecx
rep nop # Do nothing but decrement ecx
mov $1,%eax
xor %ebx,%ebx
int $0x80
It turned out the third program doesn’t work as expected, and some recherche tells me, that rep nop aka pause does something completely unrelated.
What are the rep, repz and repnz prefixes doing, when the instruction following them is not a string instruction?
It depends.
rep retis sometimes used to avoid bad performance of jumping directly to areton certain AMD processors. Therep(F3) andrepne(F2) prefixes are also used as Mandatory Prefix for many SSE instructions (for example they change packed-single variants to scalar-singe or scalar-double variants).pause(spin lock hint) is an alias ofrep nop. Some other new instructions use a “fake rep prefix” as well (popcnt,crc32,vmxon, etc). The “fake” or Mandatory Prefix comes before the optional REX prefix, so it can’t be said to be part of the opcode, it really is a prefix.Other operations generate an #UD if prefixed with a
rep.