Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7806363
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T02:30:38+00:00 2026-06-02T02:30:38+00:00

I’m building an email filter and I need a way to efficiently match a

  • 0

I’m building an email filter and I need a way to efficiently match a single email to a large number of filters/rules. The email can be matched on any of the following fields:

  • From name
  • From address
  • Sender name
  • Sender address
  • Subject
  • Message body

Presently there are over 5000 filters (and growing) which are all defined in a single table in our PostgreSQL (9.1) database. Each filter may have 1 or more of the above fields populated with a Python regular expression.

The way filtering is currently being done is to select all filters and load them into memory. We then iterate over them for each email until a positive match is found on all non-blank fields. Unfortunately this means for any one email there can potentially be as many as 30,000 (5000 x 6) re.match operations. Clearly this won’t scale as more filters get added (actually it already doesn’t).

Is there a better way to do this?

Options I’ve considered so far:

  1. Converting saved python regular expressions to POSIX style ones to make use of PostgreSQL’s SIMILAR TO expression. Will this really be any quicker? Seems to me like it’s simply shifting the load somewhere else.

  2. Defining filters on a per user basis. Though this isn’t really practical because with our system users actually benefit from a wealth of predefined filters.

  3. Switching to a document-based search engine like elastic search where the first email to be filtered is saved as the canonical representation. By finding similar emails we can then narrow down to a specific feature set to test on and get a positive match.

  4. Switching to a bayes filter which would also give us some machine learning capability to detect similar emails or changes to existing emails that would still match with a high enough probability to guess that they were the same thing. This sounds cool but I’m not sure it would scale particularly well either.

Are there other options or approaches to consider?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T02:30:40+00:00Added an answer on June 2, 2026 at 2:30 am

    The trigram support in PostgreSQL version 9.1 might give you what you want.

    http://www.postgresql.org/docs/9.1/interactive/pgtrgm.html

    It almost certainly will be a viable solution in 9.2 (scheduled for release in summer of 2012), since the new version knows how to use a trigram index for fast matching against regular expressions. At our shop we have found the speed of trigram indexes to be very good.

    Also, if you ever want to do a “nearest neighbor” search, where you find the K best matches based on similarity to a search argument, a trigram index is wonderful — it actually returns rows from the index scan in order of “distance”. Search for KNN-GiST for write-ups.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have a jquery bug and I've been looking for hours now, I can't
I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
In my XML file chapters tag has more chapter tag.i need to display chapters
We're building an app, our first using Rails 3, and we're having to build
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
I need to clean up various Word 'smart' characters in user input, including but
Does anyone know how can I replace this 2 symbol below from the string
I need a function that will clean a strings' special characters. I do NOT
I'm trying to use string.replace('’','') to replace the dreaded weird single-quote character: ’ (aka

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.