I’m trying to figure out a preg_match / php style regex to find repeating groups of alphanumeric characters(of any length), separated by commas?
so if I have string “c,b,a,xz,x,b,a,c,xz,x,x,b,a”
would return the first series of letters that repeat more than two values. I think I need to do a recursive backreference, maybe something like
<?php
// lines removed for simplicity
// test string = "c,b,a,xz,x,b,a,c,xz,x,x,b,a"
$haystack = "c,b,a,xz,x,b,a,c,xz,x,x,b,a";
$answer = preg_match('/([A-z]{2,*}[\s]{1})([A-z \s]*)[\1]*/', $haystack );
echo $answer; // print the first occurrence of the repeating series of two or more
?>
I just need to find and echo out the first occurrence of a repeating series of two or more values. Is there a way to use a backreference recursively, or some better method?
edit: code vomit removed.
'/\b(\w+,\w+),(?:.*,)?\1\b/'should work. It’d match any sequence of two items, any amount of other stuff, and then the same sequence again.Catch is, it will likely find the first sequence that has a duplicate, not the sequence that has the first duplicate, due to how regexes work. (The match that starts earliest, wins.) For example, if you have
'a,b,c,d,c,d,a,b,c',$matches[1]would probably be'a,b', even though'c,d'would match earlier.To find the first duplicate, you’d have to be able to match that and have a backreference to it in a lookbehind assertion. If that’s even legal (which i doubt it is), it’d have to be fixed width before PHP would let it happen.
Edit:
Although, now that i think about it…if you reversed the string and then used
'/.*\b(\w+,\w+),(?:.*?,)??\1\b/'on that, it might work. That dances around the constraint i’d mentioned; with the string reversed, the duplicate comes before the original, so now we can match the duplicate and then refer to it “later”.The
.*at the beginning of the expression grabs as much as it can, so the match will start as close to the end of the reversed string (and therefore, as close to the beginning of the original string) as possible. And the extra?s make their corresponding bits lazy, so they match as little as necessary. Of course, once you find the match in the reversed string, you’ll need to reverse it in order to get the match in the original string.And of course, this could break all to hell in the presence of UTF-8. (Then again, most regexes would.) If you’re just dealing with ASCII, though, it should work.