I am trying to extract a part of a string using regex. I have the following cases for string:
case1: Warehouse.13.season01episode01.hdtv.xor.avi
case2: Warehouse.13.s01e01.hdtv.xor.avi
case3: Warehouse.13.01x01.hdtv.xor.avi
The delimter(.) in the above string can be replaced by \s - _.
The logic am using is check if s or season is precided(lookbehind) by number and
extract everything before it but as look-behind need absolute length I reversed the string
and used look ahead on it.
Now for case1 I created the below regex which works fine and outputs Warehouse.13.
.*?\d{1,2}e\d{1,2}s\.(?=\d+)(.*)
Now for case2 I used:
.*?\d{1,2}edosipe\d{1,2}nosaes\.(?=\d+)(.*) # works fine.
Now when I try to combine the above two cases + optional delimiter like:
.*?\d{1,2}[e|edosipe]?[._ x\-]?\d{1,2}[s|nosaes]?[._\- ]?(?=\d+)(.*)
In the above case you can observe that most of the things are optinal(?). It is for the
case3.
Using the above regex doesn’t match anything for case2 but works fine for case1 and case3.
Any idea what is wrong here.
PS: I am aware there might be other possible string which will defy the above regex but
currently am not interested in them.
[e|edosipe]and[s|nosaes]should be(e|edosipe)and(s|nosaes), or(?:e|edopise)and(?:s|nosaes)if you don’t want the regex engine to capture them and mess up your accounting of$1,$2, etc.Here,
(...)does parenthetical grouping much like it does in any other expression in Perl.[...]defines a character class. Specifically,[s|nosaes]matches a single character that is eithera,e,n,o,s, and (perhaps surprisingly, but metacharacters special meanings are usually ignored inside[...]),|.