I am making several regex substitutions in Python along the lines of
\w\s+\w
over many large documents. Obviously if I make the regex non-greedy (with a ?) it won’t change what it matches (as \w != \s) but will it make the code run any faster?
In other words, with non-greedy regexes does Python work its way from the first character matched onwards rather than from the end of the document back to that character, or is this a naive view?
Is this the pattern you implied?
Seems to be a pretty small difference here. Only 5 microseconds with the non-greedy,
Using a 500 word lorem-ipsum, with multiple mixed whitespace between every word, I get an 8 ms difference.