Hey guys, I have a regular expression that is pretty long, and is hard to look at.
i was wondering if you could help shorten it up, so it’s more manageable.
I admit, I’m not a regexp guru, and I just hack away to get by. If you come up with something better (it doesn’t even have to be shorter), please explain your reasoning, so I might have a better understanding of the techniques you use.
Regex:
^([a-zA-Z0-9# ]+)-([a-zA-Z ]*)([a-zA-Z0-9_ ]+)-([a-zA-Z0-9_ ]+)-([a-zA-Z0-9_ ]+)-([a-zA-Z0-9_ ]+)-([a-zA-Z0-9_ ]+)-([a-zA-Z0-9_ ]+)-([a-zA-Z0-9_ ]+)-([a-zA-Z0-9_ ]+)-([a-zA-Z0-9_ ]+)-([a-zA-Z ~]+)([a-zA-Z0-9_ ]+)\.rpt$
Tests:
TESTFIX - ABCD 10118 - E008 - E009 - IXX - IXX - IXX - IXX - IXX - IXX - SX ~ 91.rpt
TESTFIX - EFGD 10118 - E008 - E009 - IXX - IXX - IXX - IXX - IXX - IXX - SX ~ 92.rpt
TESTFIX - 10118_14041 M - E008 - E009 - IXX - IXX - IXX - IXX - IXX - IXX - SX ~ 93.rpt
TESTFIX - ABCD 10118 - E008 - E009 - IXX - IXX - IXX - IXX - IXX - IXX - SX ~ 93.rpt
TESTFIX - EFGD 10118 - E008 - E009 - IXX - IXX - IXX - IXX - IXX - IXX - SX ~ 93.rpt
TESTFIX - EFGD 10118 - E008 - E009 - IXX - IXX - IXX - IXX - IXX - IXX - SX ~ 93.rpt
TESTFIX - ABCD 10118 - E008 - E009 - IXX - IXX - IXX - IXX - IXX - IXX - SX ~ 93.rpt
#1REALLYLONGNAME - 10244 - E011 - E009 - IXX - IXX - IXX - IXX - IXX - IXX - DX ~ ALPHALTR.rpt
#1 LIVEREP - 10045 - E011 - E009 - IXX - IXX - IXX - IXX - IXX - IXX - SX ~ SING.rpt
#2 LIVEREP - 10045 M - E011 - E009 - IXX - IXX - IXX - IXX - IXX - IXX - SX ~ MUL.rpt
WELLREP - WELL10000 - E011 - E009 - IXX - IXX - IXX - IXX - IXX - IXX - SX ~ CLT.rpt
each section is split up by the ‘ – ‘ sequence of characters.
All sections can contain spaces, and any valid file name character
There has to be group capturing for each section
If it matters, I’ll be using this regexp in C#
First of all, get a good regular expression development tool. My favorite is Expresso.
Here is a cleaned up version:
Changes include:
I’ll assume that you’re only
validating the text since you didn’t
mention any capturing. If you need
them, they’re easy enough to add back
EDIT:
Here it is with the capture groups back:
Note that when you go through the numbered capture groups, the third one will have 9 captures in it.