I need help creating the best possible regular expression for this problem.
I have combinations / sets of Starting and End Delimeters and I need to get ALL the substring / any words between the starting delimeter upto the end delimeter.
Assume this table of Delimeters:
START | END
CAT | DOG
APPLE | ORANGE
LION | ZEBRA
PANDA | CAT
sample Input:
substring1 CAT substring2 substring3 DOG substring4 substring5 CAT substring6
APPLE substring7 substring 8 ORANGE ORANGE substring9 DOG substring10 PANDA
substring11 CAT substring12 DOG substring13 LION substring10 substring11 ZEBRA substring12
CAT substring13 substring14 APPLE substring15 substring 16 ORANGE
The output must be:
- CAT substring2 substring3 DOG
- APPLE substrin7 substring8 ORANGE
- PANDA substring 11 CAT
- LION substring10 substring 11 ZEBRA
- APPLE substring15 substring16 ORANGE
My regular expression:
CAT (.)*? DOG | APPLE (.)*? ORANGE | LION (.)*? ZEBRE | PANDA (.)*? CAT
I have problem dealing with string that has multiple occurence of other starting delimeter.
take for example:
CAT word1 word2 word3 word4 APPLE word5 word6 word7 DOG
I know that it will match with this CAT (.)*? DOG but this is wrong since the substring contains one of the starting delimeters.
I just need a regex that that will get all the words between a starting delimeter upto its matching end delimeter if ever the substring does not contain any occurence of other starting delimeters.
any suggestion? Thanks
The technique that helps us here is called “lookaround”.
I Updated my answer after clarification of nfinium and feedback from jsobo
Given the input:
It matches
Specificaly, it will not match the following as indicated by nfinium
And also matches as you pointed out
You say that it should match the following
but I dont think it should not since the CAT from above is the end delimiter of
This regex produces the expected result of nfinium
Note that as per the requirments of nfinium CAT can be a starting and an ending delimiter