Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6619459
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T20:58:16+00:00 2026-05-25T20:58:16+00:00

Is it possible to match 2 regular expressions in Python? For instance, I have

  • 0

Is it possible to match 2 regular expressions in Python?

For instance, I have a use-case wherein I need to compare 2 expressions like this:

re.match('google\.com\/maps', 'google\.com\/maps2', re.IGNORECASE)

I would expect to be returned a RE object.

But obviously, Python expects a string as the second parameter.
Is there a way to achieve this, or is it a limitation of the way regex matching works?


Background: I have a list of regular expressions [r1, r2, r3, …] that match a string and I need to find out which expression is the most specific match of the given string. The way I assumed I could make it work was by:
(1) matching r1 with r2.
(2) then match r2 with r1.
If both match, we have a ‘tie’. If only (1) worked, r1 is a ‘better’ match than r2 and vice-versa.
I’d loop (1) and (2) over the entire list.

I admit it’s a bit to wrap one’s head around (mostly because my description is probably incoherent), but I’d really appreciate it if somebody could give me some insight into how I can achieve this. Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T20:58:17+00:00Added an answer on May 25, 2026 at 8:58 pm

    Outside of the syntax clarification on re.match, I think I am understanding that you are struggling with taking two or more unknown (user input) regex expressions and classifying which is a more ‘specific’ match against a string.

    Recall for a moment that a Python regex really is a type of computer program. Most modern forms, including Python’s regex, are based on Perl. Perl’s regex’s have recursion, backtracking, and other forms that defy trivial inspection. Indeed a rogue regex can be used as a form of denial of service attack.

    To see of this on your own computer, try:

    >>> re.match(r'^(a+)+$','a'*24+'!')
    

    That takes about 1 second on my computer. Now increase the 24 in 'a'*24 to a bit larger number, say 28. That take a lot longer. Try 48… You will probably need to CTRL+C now. The time increase as the number of a’s increase is, in fact, exponential.

    You can read more about this issue in Russ Cox‘s wonderful paper on ‘Regular Expression Matching Can Be Simple And Fast’. Russ Cox is the Goggle engineer that built Google Code Search in 2006. As Cox observes, consider matching the regex 'a?'*33 + 'a'*33 against the string of 'a'*99 with awk and Perl (or Python or PCRE or Java or PHP or …) Awk matches in 200 microseconds but Perl would require 1015 years because of exponential back tracking.

    So the conclusion is: it depends! What do you mean by a more specific match? Look at some of Cox’s regex simplification techniques in RE2. If your project is big enough to write your own libraries (or use RE2) and you are willing to restrict the regex grammar used (i.e., no backtracking or recursive forms), I think the answer is that you would classify ‘a better match’ in a variety of ways.

    If you are looking for a simple way to state that (regex_3 < regex_1 < regex_2) when matched against some string using Python or Perl’s regex language, I think that the answer is it is very very hard (i.e., this problem is NP Complete)

    Edit

    Everything I said above is true! However, here is a stab at sorting matching regular expressions based on one form of ‘specific’: How many edits to get from the regex to the string. The greater number of edits (or the higher the Levenshtein distance) the less ‘specific’ the regex is.

    You be the judge if this works (I don’t know what ‘specific’ means to you for your application):

    import re
    
    def ld(a,b):
        "Calculates the Levenshtein distance between a and b."
        n, m = len(a), len(b)
        if n > m:
            # Make sure n <= m, to use O(min(n,m)) space
            a,b = b,a
            n,m = m,n
    
        current = range(n+1)
        for i in range(1,m+1):
            previous, current = current, [i]+[0]*n
            for j in range(1,n+1):
                add, delete = previous[j]+1, current[j-1]+1
                change = previous[j-1]
                if a[j-1] != b[i-1]:
                    change = change + 1
                current[j] = min(add, delete, change)      
        return current[n]
    
    s='Mary had a little lamb'    
    d={}
    regs=[r'.*', r'Mary', r'lamb', r'little lamb', r'.*little lamb',r'\b\w+mb',
            r'Mary.*little lamb',r'.*[lL]ittle [Ll]amb',r'\blittle\b',s,r'little']
    
    for reg in regs:
        m=re.search(reg,s)
        if m:
            print "'%s' matches '%s' with sub group '%s'" % (reg, s, m.group(0))
            ld1=ld(reg,m.group(0))
            ld2=ld(m.group(0),s)
            score=max(ld1,ld2)
            print "  %i edits regex->match(0), %i edits match(0)->s" % (ld1,ld2)
            print "  score: ", score
            d[reg]=score
            print
        else:
            print "'%s' does not match '%s'" % (reg, s)   
    
    print "   ===== %s =====    === %s ===" % ('RegEx'.center(10),'Score'.center(10))
    
    for key, value in sorted(d.iteritems(), key=lambda (k,v): (v,k)):
        print "   %22s        %5s" % (key, value) 
    

    The program is taking a list of regex’s and matching against the string Mary had a little lamb.

    Here is the sorted ranking from “most specific” to “least specific”:

       =====   RegEx    =====    ===   Score    ===
       Mary had a little lamb            0
            Mary.*little lamb            7
                .*little lamb           11
                  little lamb           11
          .*[lL]ittle [Ll]amb           15
                   \blittle\b           16
                       little           16
                         Mary           18
                      \b\w+mb           18
                         lamb           18
                           .*           22
    

    This based on the (perhaps simplistic) assumption that: a) the number of edits (the Levenshtein distance) to get from the regex itself to the matching substring is the result of wildcard expansions or replacements; b) the edits to get from the matching substring to the initial string. (just take one)

    As two simple examples:

    1. .* (or .*.* or .*?.* etc) against any sting is a large number of edits to get to the string, in fact equal to the string length. This is the max possible edits, the highest score, and the least ‘specific’ regex.
    2. The regex of the string itself against the string is as specific as possible. No edits to change one to the other resulting in a 0 or lowest score.

    As stated, this is simplistic. Anchors should increase specificity but they do not in this case. Very short stings don’t work because the wild-card may be longer than the string.

    Edit 2

    I got anchor parsing to work pretty darn well using the undocumented sre_parse module in Python. Type >>> help(sre_parse) if you want to read more…

    This is the goto worker module underlying the re module. It has been in every Python distribution since 2001 including all the P3k versions. It may go away, but I don’t think it is likely…

    Here is the revised listing:

    import re
    import sre_parse
    
    def ld(a,b):
        "Calculates the Levenshtein distance between a and b."
        n, m = len(a), len(b)
        if n > m:
            # Make sure n <= m, to use O(min(n,m)) space
            a,b = b,a
            n,m = m,n
    
        current = range(n+1)
        for i in range(1,m+1):
            previous, current = current, [i]+[0]*n
            for j in range(1,n+1):
                add, delete = previous[j]+1, current[j-1]+1
                change = previous[j-1]
                if a[j-1] != b[i-1]:
                    change = change + 1
                current[j] = min(add, delete, change)      
        return current[n]
    
    s='Mary had a little lamb'    
    d={}
    regs=[r'.*', r'Mary', r'lamb', r'little lamb', r'.*little lamb',r'\b\w+mb',
            r'Mary.*little lamb',r'.*[lL]ittle [Ll]amb',r'\blittle\b',s,r'little',
            r'^.*lamb',r'.*.*.*b',r'.*?.*',r'.*\b[lL]ittle\b \b[Ll]amb',
            r'.*\blittle\b \blamb$','^'+s+'$']
    
    for reg in regs:
        m=re.search(reg,s)
        if m:
            ld1=ld(reg,m.group(0))
            ld2=ld(m.group(0),s)
            score=max(ld1,ld2)
            for t, v in sre_parse.parse(reg):
                if t=='at':      # anchor...
                    if v=='at_beginning' or 'at_end':
                        score-=1   # ^ or $, adj 1 edit
    
                    if v=='at_boundary': # all other anchors are 2 char
                        score-=2
    
            d[reg]=score
        else:
            print "'%s' does not match '%s'" % (reg, s)   
    
    print
    print "   ===== %s =====    === %s ===" % ('RegEx'.center(15),'Score'.center(10))
    
    for key, value in sorted(d.iteritems(), key=lambda (k,v): (v,k)):
        print "   %27s        %5s" % (key, value) 
    

    And soted RegEx’s:

       =====      RegEx      =====    ===   Score    ===
            Mary had a little lamb            0
          ^Mary had a little lamb$            0
              .*\blittle\b \blamb$            6
                 Mary.*little lamb            7
         .*\b[lL]ittle\b \b[Ll]amb           10
                        \blittle\b           10
                     .*little lamb           11
                       little lamb           11
               .*[lL]ittle [Ll]amb           15
                           \b\w+mb           15
                            little           16
                           ^.*lamb           17
                              Mary           18
                              lamb           18
                           .*.*.*b           21
                                .*           22
                             .*?.*           22
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.