Is it possible to compare whole memory regions in a single processor cycle? More precisely is it possible to compare two strings in one processor cycle using some sort of MMX assembler instruction? Or is strcmp-implementation already based on that optimization?
EDIT:
Or is it possible to instruct C++ compiler to remove string duplicates, so that strings can be compared simply by their memory location? Instead of memcmp(a,b) compared by a==b (assuming that a and b are both native const char* strings).
Not really. Your typical 1-byte compare instruction takes 1 cycle.
Your best bet would be to use the MMX 64-bit compare instructions( see this page for an example). However, those operate on registers, which have to be loaded from memory. The memory loads will significantly damage your time, because you’ll be going out to L1 cache at best, which adds some 10x time slowdown*. If you are doing some heavy string processing, you can probably get some nifty speedup there, but again, it’s going to hurt.
Other people suggest pre-computing strings. Maybe that’ll work for your particular app, maybe it won’t. Do you have to compare strings? Can you compare numbers?
Your edit suggests comparing pointers. That’s a dangerous situation unless you can specifically guarantee that you won’t be doing substring compares(ie, you are comparing some two byte strings: [0x40, 0x50] with [0x40, 0x42]. Those are not “equal”, but a pointer compare would say they are).
Have you looked at the gcc strcmp() source? I would suggest that doing that would be the ideal starting place.
* Loosely speaking, if a cycle takes 1 unit, a L1 hit takes 10 units, an L2 hit takes 100 units, and an actual RAM hit takes really long.