patterns = {} patterns[1] = re.compile([A-Z]\d-[A-Z]\d) patterns[2] = re.compile([A-Z]\d-[A-Z]\d\d) patterns[3] = re.compile([A-Z]\d\d-[A-Z]\d\d) patterns[4] =

Question

0

Asked: May 24, 20262026-05-24T12:14:17+00:00 2026-05-24T12:14:17+00:00

patterns = {} patterns[1] = re.compile([A-Z]\d-[A-Z]\d) patterns[2] = re.compile([A-Z]\d-[A-Z]\d\d) patterns[3] = re.compile([A-Z]\d\d-[A-Z]\d\d) patterns[4] =

0

    patterns = {}
    patterns[1] = re.compile("[A-Z]\d-[A-Z]\d")
    patterns[2] = re.compile("[A-Z]\d-[A-Z]\d\d")
    patterns[3] = re.compile("[A-Z]\d\d-[A-Z]\d\d")
    patterns[4] = re.compile("[A-Z]\d\d-[A-Z]\d\d\d")
    patterns[5] = re.compile("[A-Z]\d\d\d-[A-Z]\d\d\d")
    patterns[6] = re.compile("[A-Z][A-Z]\d-[A-Z][A-Z]\d")
    patterns[7] = re.compile("[A-Z][A-Z]\d-[A-Z][A-Z]\d\d")
    patterns[8] = re.compile("[A-Z][A-Z]\d\d-[A-Z][A-Z]\d\d")
    patterns[9] = re.compile("[A-Z][A-Z]\d\d-[A-Z][A-Z]\d\d\d")
    patterns[10] = re.compile("[A-Z][A-Z]\d\d\d-[A-Z][A-Z]\d\d\d")

    def matchFound(toSearch):
        for items in sorted(patterns.keys(), reverse=True):
            matchObject = patterns[items].search(toSearch)
            if matchObject:
                return items
        return 0

then I use the following code to look for matches:

        while matchFound(toSearch) > 0:

I have 10 different regular expressions but I feel like they could be replaced by one, well written, more elegant regular expression. Do you guys think it’s possible?

EDIT: FORGOT TWO MORE EXPRESSIONS:

  patterns[11] = re.compile("[A-Z]\d-[A-Z]\d\d\d")
  patterns[12] = re.compile("[A-Z][A-Z]\d-[A-Z][A-Z]\d\d\d")

EDIT2: I ended up with the following. I realize I COULD get extra results but I don’t think they’re possible in the data I’m parsing.

    patterns = {}
    patterns[1] = re.compile("[A-Z]{1,2}\d-[A-Z]{1,2}\d{1,3}")
    patterns[2] = re.compile("[A-Z]{1,2}\d\d-[A-Z]{1,2}\d{2,3}")
    patterns[3] = re.compile("[A-Z]{1,2}\d\d\d-[A-Z]{1,2}\d\d\d")

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T12:14:18+00:00

Josh Caswell noted that Sean Bright’s answer will match more inputs than your original group. Sorry I didn’t figure this out. (In the future it might be good to spell out your problem a little bit more.)

So your basic problem is that regular expressions can’t count. But we can still solve this in Python in a very slick way. First we make a pattern that matches any of your legal inputs, but would also match some you want to reject.

Next, we define a function that uses the pattern and then examines the match object, and counts to make sure that the matched string meets the length requirements.

import re
_s_pat = r'([A-Z]{1,2})(\d{1,3})-([A-Z]{1,2})(\d{1,3})'
_pat = re.compile(_s_pat)

_valid_n_len = set([(1,1), (1,2), (1,3), (2,2), (2,3), (3,3)])
def check_match(s):
    m = _pat.search(s)
    try:
        a0, n0, a1, n1 = m.groups()
        if len(a0) != len(a1):
            return False
        if not (len(n0), len(n1)) in _valid_n_len:
            return False
        return True
    except (AttributeError, TypeError, ValueError):
        return False

Here is some explanation of the above code.

First we use a raw string to define the pattern, and then we pre-compile the pattern. We could just stuff the literal string into the call to re.compile() but I like to have a separate string. Our pattern has four distinct sections enclosed in parentheses; these will become “match groups”. There are two match groups to match the alphabet characters, and two match groups to match numbers. This one pattern will match everything you want, but won’t exclude some stuff you don’t want.

Next we declare a set that has all the valid lengths for numbers. For example, the first group of numbers can be 1 digit long and the second group can be 2 digits; this is (1,2) (a tuple value). A set is a nice way to specify all the possible combinations that we want to be legal, while still being able to check quickly whether a given pair of lengths is legal.

The function check_match() first uses the pattern to match against the string, returning a “match object” which is bound to the name m. If the search fails, m might be set to None. Instead of explicitly testing for None, I used a try/except block; in retrospect it might have been better to just test for None. Sorry, I didn’t mean to be confusing. But the try/except block is a pretty simple way to wrap something and make it very reliable, so I often use it for things like this.

Finally, check_match() unpacks the match groups into four variables. The two alpha groups are a0 and a1, and the two number groups are n0 and n1. Then it checks that the lengths are legal. As far as I can tell, the rule is that alpha groups need to be the same length; and then we build a tuple of number group lengths and check to see if the tuple is in our set of valid tuples.

Here’s a slightly different version of the above. Maybe you will like it better.

import re
# match alpha: 1 or 2 capital letters
_s_pat_a = r'[A-Z]{1,2}'
# match number: 1-3 digits
_s_pat_n = r'\d{1,3}'

# pattern: four match groups: alpha, number, alpha, number
_s_pat = '(%s)(%s)-(%s)(%s)' % (_s_pat_a, _s_pat_n, _s_pat_a, _s_pat_n)
_pat = re.compile(_s_pat)

# set of valid lengths of number groups
_valid_n_len = set([(1,1), (1,2), (1,3), (2,2), (2,3), (3,3)])

def check_match(s):
    m = _pat.search(s)
    if not m:
        return False
    a0, n0, a1, n1 = m.groups()
    if len(a0) != len(a1):
        return False
    tup = (len(n0), len(n1)) # make tuple of actual lengths
    if not tup in _valid_n_len:
        return False
    return True

Note: It looks like the rule for valid lengths is actually simple:

    if len(n0) > len(n1):
        return False

If that rule works for you, you could get rid of the set and the tuple stuff. Hmm, and I’ll make the variable names a bit shorter.

import re
# match alpha: 1 or 2 capital letters
pa = r'[A-Z]{1,2}'
# match number: 1-3 digits
pn = r'\d{1,3}'

# pattern: four match groups: alpha, number, alpha, number
p = '(%s)(%s)-(%s)(%s)' % (pa, pn, pa, pn)
_pat = re.compile(p)

def check_match(s):
    m = _pat.search(s)
    if not m:
        return False
    a0, n0, a1, n1 = m.groups()
    if len(a0) != len(a1):
        return False
    if len(n0) > len(n1):
        return False
    return True

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

patterns = {} patterns[1] = re.compile([A-Z]\d-[A-Z]\d) patterns[2] = re.compile([A-Z]\d-[A-Z]\d\d) patterns[3] = re.compile([A-Z]\d\d-[A-Z]\d\d) patterns[4] =

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply