Since, I’m new in regular expressions; I want to make a regular expression to select two consecutive words.
For example when i give this phrase: “Hello people #RegularExpression sucks!”
It has to return these couple words:
-Hello people
-people #RegularExpression
-#RegularExpression sucks!
I tried this /\w\s\w/i but it did not work 🙁
output:
explanation:
\S+matches one or more non-whitespace characters. Your\wwas incorrect for two reasons: it only only matches one character; and it only matches a so-called word character (equivalent to[A-Za-z0-9_]). Adding the+to your\swasn’t necessary in this test case, but there’s no reason not to add it, and extra whitespace does have a way of sneaking into text in the real world. (But be sure and add+, not*; there must be at least one whitespace character in there.)(?=...)is a positive lookahead. You use them to check whether it’s possible to match the enclosed subexpression at the current match position, without advancing the match position. Then, typically, you go ahead and match a different subexpression, not in a lookahead.Here’s the tricky bit: Although the characters matched by the lookahead subexpression are not consumed, any capturing groups in the subexpression work as usual. The lookahead in my regex,
(?=(\S+\s+\S+))matches and captures the next two-word sequence. Then (assuming the lookahead succeeded)\S+\s+matches in the normal way, setting the match position correctly for the next attempt.This technique should work in any regex flavor that supports capturing groups and lookaheads. That includes PHP as well as all the other major languages (Perl, JavaScript, .NET, Python, Java…). The technique for accessing only the contents of the first capturing group from each match varies wildly from one language to the next, but PHP makes it easy, with
$matches[1].