is there any faster way to parse a text than by walk each byte of the text?
I wonder if there is any special CPU (x86/x64) instruction for string operation that is used by string library, that somehow used to optimize the parsing routine.
for example instruction like finding a token in a string that could be run by hardware instead of looping each byte until a token is found.
*edited->note: I’am asking more to algorithm instead of CPU architecture, so my really question is, is there any special algorithm or specific technique that could optimize the string manipulation routine given the current cpu architecture.
The x86 had a few string instructions, but they fell out of favor on modern processors, because they became slower than more primitive instructions which do the same thing.
The processor world is moving more and more towards RISC, ie, simplistic instruction sets.
Quote from Wikipedia (emphasis mine):
This is still true on today’s x86 processors.
You could get marginally better performance processing four bytes at a time, assuming each “token” in the text was four-byte-aligned. Obviously this isn’t true for most text… so better to stick with byte-by-byte scanning.