I give an example to easily describe the problem.
Input text:
Wayne Rooney is an English footballer who plays as a striker for Manchester United. Rooney became the youngest player to play for England when he earned his first cap in a friendly against Australia. Theo Walcott broke Rooney’s appearance record by 36 days in May 2006.
Input keyword: wayne rooney
Expected output (keyword count): 3 (wayne rooney, rooney, rooney’s)
So, it doesn’t only count “wayne rooney”, but also other similar words.
I have searching over SO, I got this regex:
$keyword_count = preg_match_all("/(\w*(?:wayne|rooney)\w*)/i", $source, $res);
But it gives me 4 as the output. It counts “wayne rooney” as two different keywords.
Could anyone help me to construct the correct formula?
Is Regex really the most efficient solution for this? I have a high volume of text to search. Any other solution, for example Text Mining library for PHP?
Thanks a lot.
Try this regex:
If you have limited count of regular rules to parse string, regex is appropriate to solve your problem. In general case you should use other methods (may be several regex).