I have a set of n tokens (e.g., a, b, c) distributed among a bunch of other tokens. I would like to know if all members of my set occur within a given number of positions (window size). It occurred to me that it may be possible to write a RegEx to capture this state, but the exact syntax eludes me.
11111
012345678901234
ab ab bc a cba
In this example, given window size=5, I would like to match cba at positions 12-14, and abc in positions 3-7.
Is there a way to do this with RegEx, or is there some other kind of grammar that I can use to capture this logic?
I am hoping to implement this in Java.
Here’s a regex that matches 5-letter sequences that include all of ‘a’, ‘b’ and ‘c’:
So, while basically matching any 5 characters (with
.{5}), there are three preconditions the matches have to observe. Each of them requires one of the tokens/letters to be present (up to 4 characters followed by ‘a’, etc.).(?=X)matches “X, with a zero-width positive look-ahead”, where zero-width means that the character position is not moved while matching.Doing this with regexes is slow, though.. Here’s a more direct version (seems about 15x faster than using regular expressions):