Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7814771
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T05:18:03+00:00 2026-06-02T05:18:03+00:00

How do I test whether a phrase is in a large (650k) list of

  • 0

How do I test whether a phrase is in a large (650k) list of phrases when that list includes special categories?

For instance, I want to test if the phrase ["he", "had", "the", "nerve"] is in the list. It is, but under ["he", "had", "!DETERMINER", "nerve"] where "!DETERMINER" is the name of a wordclass that contains several choices (a, an, the). I have about 350 wordclasses and some of them are quite lengthy, so I don’t think it would be feasible to enumerate each item in the list that has one (or more) wordclasses.

I would like to use a set of these phrases instead of slowly working my way through a list, but I don’t know how to deal with the variability of the wordclasses. Speed is pretty important, since I need to make this comparison hundreds of thousands of times per go.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T05:18:05+00:00Added an answer on June 2, 2026 at 5:18 am

    Similar to pjwerneck’s suggestion, you could use a tree (or more specifically a trie) to store the lists in parts, but extend it to treat the categories specially.

    # phrase_trie.py
    
    from collections import defaultdict
    
    CATEGORIES = {"!DETERMINER": set(["a","an","the"]),
                  "!VERB": set(["walked","talked","had"])}
    
    def get_category(word):
        for name,words in CATEGORIES.items():
            if word in words:
                return name
        return None
    
    class PhraseTrie(object):
        def __init__(self):
            self.children = defaultdict(PhraseTrie)
            self.categories = defaultdict(PhraseTrie)
    
        def insert(self, phrase):
            if not phrase: # nothing to insert
                return
    
            this=phrase[0]
            rest=phrase[1:]
    
            if this in CATEGORIES: # it's a category name
                self.categories[this].insert(rest)
            else:
                self.children[this].insert(rest)
    
        def contains(self, phrase):
            if not phrase:
                return True # the empty phrase is in everything
    
            this=phrase[0]
            rest=phrase[1:]
    
            test = False
    
            # the `if not test` are because if the phrase satisfies one of the
            # previous tests we don't need to bother searching more
    
            # allow search for ["!DETERMINER", "cat"]
            if this in self.categories: 
                test = self.categories[this].contains(rest)
    
            # the word is literally contained
            if not test and this in self.children:
                test = self.children[this].contains(rest)
    
            if not test:
                # check for the word being in a category class like "a" in
                # "!DETERMINER"
                cat = get_category(this)
                if cat in self.categories:
                    test = self.categories[cat].contains(rest)
            return test
    
        def __str__(self):
            return '(%s,%s)' % (dict(self.children), dict(self.categories))
        def __repr__(self):
            return str(self)
    
    if __name__ == '__main__':
        words = PhraseTrie()
        words.insert(["he", "had", "!DETERMINER", "nerve"])
        words.insert(["he", "had", "the", "evren"])
        words.insert(["she", "!VERB", "the", "nerve"])
        words.insert(["no","categories","here"])
    
        for phrase in ("he had the nerve",
                       "he had the evren",
                       "she had the nerve",
                       "no categories here",
                       "he didn't have the nerve",
                       "she had the nerve more"):
            print '%25s =>' % phrase, words.contains(phrase.split())
    

    Running python phrase_trie.py:

             he had the nerve => True
             he had the evren => True
            she had the nerve => True
           no categories here => True
     he didn't have the nerve => False
       she had the nerve more => False
    

    Some points about the code:

    • The use of defaultdict is to avoid having to check if that sub-trie exists before calling insert; it is automatically created and initialised when needed.
    • If there are going to be a lot of calls to get_category, it might be worth constructing a reverse look-up dictionary for speed. (Or, even better, memoise the calls to get_category so that common words have fast look-ups but you don’t waste the memory storing words you never look up.)
    • The code assumes that each word is in only one category. (If not, the only changes are get_category returning a list and the relevant section of PhraseTrie looping through this list.)
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I want to test whether a certain string is contained in a short list
I want test whether a string is present within any of the list values
I want to test whether an object is empty: {} . The following is
I was trying to test whether the Alt key was pressed. I had a
I want to test whether two languages have a string in common. Both of
Give a linear-time algorithm to test whether a tree has a perfect matching, that
More specifically I want to test whether Oracle ODP.Net is installed on a machine.
What is the best way to test whether a list contains a given value
I want to test whether a request parameter is greater then 12 or not
I want to test whether Foo.app exists before I call: $ open -a Foo.app

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.