Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 196147
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T16:44:33+00:00 2026-05-11T16:44:33+00:00

I have a list of possible substrings, e.g. [‘cat’, ‘fish’, ‘dog’] . In practice,

  • 0

I have a list of possible substrings, e.g. ['cat', 'fish', 'dog']. In practice, the list contains hundreds of entries.

I’m processing a string, and what I’m looking for is to find the index of the first appearance of any of these substrings.

To clarify, for '012cat' the result is 3, and for '0123dog789cat' the result is 4.

I also need to know which substring was found (e.g. its index in the substring list or the text itself), or at least the length of the substring matched.

There are obvious brute-force ways to achieve this, I wondered if there’s any elegant Python/regex solution for this.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-11T16:44:33+00:00Added an answer on May 11, 2026 at 4:44 pm

    I would assume a regex is better than checking for each substring individually because conceptually the regular expression is modeled as a DFA, and so as the input is consumed all matches are being tested for at the same time (resulting in one scan of the input string).

    So, here is an example:

    import re
    
    def work():
      to_find = re.compile("cat|fish|dog")
      search_str = "blah fish cat dog haha"
      match_obj = to_find.search(search_str)
      the_index = match_obj.start()  # produces 5, the index of fish
      which_word_matched = match_obj.group()  # "fish"
      # Note, if no match, match_obj is None
    

    UPDATE:
    Some care should be taken when combining words in to a single pattern of alternative words. The following code builds a regex, but escapes any regex special characters and sorts the words so that longer words get a chance to match before any shorter prefixes of the same word:

    def wordlist_to_regex(words):
        escaped = map(re.escape, words)
        combined = '|'.join(sorted(escaped, key=len, reverse=True))
        return re.compile(combined)
    
    >>> r.search('smash atomic particles').span()
    (6, 10)
    >>> r.search('visit usenet:comp.lang.python today').span()
    (13, 29)
    >>> r.search('a north\south division').span()
    (2, 13)
    >>> r.search('012cat').span()
    (3, 6)
    >>> r.search('0123dog789cat').span()
    (4, 7)
    

    END UPDATE

    It should be noted that you will want to form the regex (ie – call to re.compile()) as little as possible. The best case would be you know ahead of time what your searches are (or you compute them once/infrequently) and then save the result of re.compile somewhere. My example is just a simple nonsense function so you can see the usage of the regex. There are some more regex docs here:

    http://docs.python.org/library/re.html

    Hope this helps.

    UPDATE: I am unsure about how python implements regular expressions, but to answer Rax’s question about whether or not there are limitations of re.compile() (for example, how many words you can try to "|" together to match at once), and the amount of time to run compile: neither of these seem to be an issue. I tried out this code, which is good enough to convince me. (I could have made this better by adding timing and reporting results, as well as throwing the list of words into a set to ensure there are no duplicates… but both of these improvements seem like overkill). This code ran basically instantaneously, and convinced me that I am able to search for 2000 words (of size 10), and that and of them will match appropriately. Here is the code:

    import random
    import re
    import string
    import sys
    
    def main(args):
        words = []
        letters_and_digits = "%s%s" % (string.letters, string.digits)
        for i in range(2000):
            chars = []
            for j in range(10):
                chars.append(random.choice(letters_and_digits))
            words.append(("%s"*10) % tuple(chars))
        search_for = re.compile("|".join(words))
        first, middle, last = words[0], words[len(words) / 2], words[-1]
        search_string = "%s, %s, %s" % (last, middle, first)
        match_obj = search_for.search(search_string)
        if match_obj is None:
            print "Ahhhg"
            return
        index = match_obj.start()
        which = match_obj.group()
        if index != 0:
            print "ahhhg"
            return
        if words[-1] != which:
            print "ahhg"
            return
    
        print "success!!! Generated 2000 random words, compiled re, and was able to perform matches."
    
    if __name__ == "__main__":
        main(sys.argv)
    

    UPDATE: It should be noted that the order of of things ORed together in the regex matters. Have a look at the following test inspired by TZOTZIOY:

    >>> search_str = "01catdog"
    >>> test1 = re.compile("cat|catdog")
    >>> match1 = test1.search(search_str)
    >>> match1.group()
    'cat'
    >>> match1.start()
    2
    >>> test2 = re.compile("catdog|cat")  # reverse order
    >>> match2 = test2.search(search_str)
    >>> match2.group()
    'catdog'
    >>> match2.start()
    2
    

    This suggests the order matters :-/. I am not sure what this means for Rax’s application, but at least the behavior is known.

    UPDATE: I posted this questions about the implementation of regular expressions in Python which will hopefully give us some insight into the issues found with this question.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Possible Duplicate: Java - Regex problem I have list of URLs of types: http://www.example.com/pk/etc
Is it possible to have a list be evaluated lazily in Python? For example
Possible Duplicate: Concatenating a C# List of byte[] I have a list of byte[]
I have an input field that uses Autocomplete to show a list of possible
Possible Duplicate: Natural Sort Order in C# I have a list with a lot
If I have a list of objects IEnumerable<MyType> myTypes; Is it possible for me
I have a sortable list like this one: http://jqueryui.com/demos/sortable Is it possible to get
how can i make it possible say i have a unordered list of 17
I have the following query which returns a list of questions and the possible
Possible Duplicate: Python, compute list difference I have two lists For example: A =

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.