I am looking for a regex for some filename parsing in order to count the number of instances that a filename prefix occurs. Here are some sample strings
gloves.tga 10jeans.jpg shirt1.png shirt2.png coat_00.png coat_12.gif top1_01.png top2_04.png
The basic pattern is just a string of letters or numbers followed by an extension. The prefix is everything before the extension (excluding the period)
A single piece of clothing may be spread across multiple files, indicated by the clothing name, followed by an underscore, followed by some index numbers and then the extension. The prefix is everything up to but not including the underscore. Everything else can be ignored.
This covers all of the cases I’m working with, but I’m having trouble working with the fact that one case has an underscore while the other case doesn’t.
Can someone help me come up with a regex for this?
EDIT: There seems to be an extra condition: shirt1 and shirt2 should be treated as the same prefix.
So if a string is followed by some numbers, and immediately followed by an extension, then the numbers should be ignored, whereas if the numbers were followed by an underscore, then they would be kept in the prefix.
Won’t this work? (Perl/PCRE syntax)
That will capture the longest prefix of the string that contains no periods or underscores.
EDIT: OK, if
shirtis the prefix inshirt1, then you can try something like this:which disallows prefixes that end in a digit. That won’t work in Ruby 1.8, though, since 1.8 doesn’t have lookbehind assertions.
EDIT 2:
The above means that the prefix of
top1_01istop, but we want that one to include the digits before the underscore. So our last attempt is to add an alternative:The prefix has to either not end in a digit or be followed by an underscore.
Demo:
Output: