I found some partial help but cannot seem to fully accomplish what I need. I need to be able to do the following:
I need an regular expression to replace any 1 to 3 character words between two words that are longer than 3 characters with a match any expression:
For example:
walk to the beach ==> walk(.*)beach
If the 1 to 3 character word is not preceded by a word that’s longer than 3 characters then I want to translate that 1 to 3 letter word to '<word> ?'
For example:
on the beach ==> on ?the ?beach
The simpler the rule the better (of course, if there’s an alternative more complicated version that’s more performant then I’ll take that as well as I eventually anticipate heavy usage eventually).
This will be used in a PHP context most likely with preg_replace. Thus, if you can put it in that context then even better!
By the way so far I have got the following:
$string = preg_replace('/\s+/', '(.*)', $string);
$string = preg_replace('/\b(\w{1,3})(\.*)\b/', '${1} ?', $string);
but that results in:
walk to the beach ==> 'walk(.*)to ?beach'
which is not what I want. 'on the beach' seems to translate correctly.
I think you will need two replacements for that. Let’s start with the first requirement:
Of course, you need to replace those
\w(which match letters, digits and underscores) with a character class of what you actually want to treat as a word character.The second one is a bit tougher, because matches cannot overlap and lookbehinds cannot be of variable length. So we have to run this multiple times in a loop:
Here we match everything from the beginning of the string, as long as it’s only up-to-3-letter words separated by spaces, plus one trailing space (only if it is not already followed by a
?). Then we put all of that back in place, and append a?.Update:
After all the talk in the comments, here is an updated solution.
After running the first line, we can assume that the only less-than-3-letter words left will be at the beginning or at the end of the string. All others will have been collapsed to
(.*). Since you want to append all spaces between those with?, you do not even need a loop (in fact these are the only spaces left):(Do this right after my first line of code.)
This would give the following two results (in combination with the first line):