I’m working with long paragraphs of text that are searchable using MySQL and PHP. I’d like to be able to find and highlight only the relevant search terms and use regex to isolate them.
For example, I’d like to transform a Lorem ipsum paragraph,
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est
laborum.
into something like this when searching for “dolor”,
Lorem ipsum *dolor* sit amet ... labore et *dolor*e magna aliqua ... aute irure *dolor* in reprehenderit ... esse cillum *dolor*e eu fugiat ...
with two (or however many) words before and after the term.
So far I have this
search - .*?(\w+?\b\s){2}(dolor)(\w+?\b\s){2}.*?
replace - ... $1*$2*$3...
but it’s not entirely working; it only finds one word before and after (despite the {2}), fails when the search string is at the beginning or end of a string (or sentence), and doesn’t eliminate rest of the paragraph after the final found instance of the search string.
What’s the best way to do this?
Thanks!
A couple of changes:
Firstly, the
{2}multiplier needs to be contained within memory in both cases, to ensure you’re remembering both words. This means we can ignore$2when reading it back ($5now contains the last word matched).Secondly, in the case of “dolore” and anything else with dolor\w+, the terminal ‘e’ becomes a word in its own right; to match your specification above, I’ve added \w*\s* to trap any end-of-word chars and terminal spaces in the remainder.
Otherwise, the non-greedy “?” char isn’t really needed here because you’re already specifying \b at the end of your \w+, so I’ve cleaned those out too.