Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3301810
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T20:47:19+00:00 2026-05-17T20:47:19+00:00

This is a design question. Background: We get a web request into our system

  • 0

This is a design question.

Background: We get a web request into our system from many different websites (for a widget that we give out), from which we grab the referrer string (if it exists). We use the referrer to decide on some things within the application. The problem arises in that I need to look at a list of “sites” (urls, partial urls, urls containing wildcards) in order to determine what to do. This list could be on the order of many thousands of sites. I need to be able to ask something like a “Site Service” (or whatever) if the referrer is a match with anything in the site list. I need to do this fast, say 5-10ms, give or take a few ms, and get a positive or negative result back.

Here is a basic example:

Request – Referrer = http://www.stackoverflow.com/users/120262?tab=accounts

Site List Could Contain urls like:

  • users.stackoverflow.com — (not a match)
  • www.stackoverflow.com/users — (match)
  • www.stackoverflow.com/users/120262 — (match)
  • www.stackoverflow.com/users/* — (match)
  • */users/* — (match)
  • www.stackoverflow.com/users/239289 — (not a match)
  • *.stackoverflow.com/questions/ask — (not a match)
  • */questions/* — (not a match)
  • www.stackoverflow.com — (match)
  • www.msdn.com — (not a match)
  • *.msdn.com — (not a match)
  • developer.*.com — (not a match)

You get the idea…

The issue I am dealing with is how to handle this in a performant and scalable way.

Performant in that I need to make a decision fast so that I can move on to the real processing that needs to happen.

Scalable in that the list of thousands of “sites” is setup for each affiliate that we have and they each may have many site lists, making for thousands of site lists containing thousands of sites.

I’m willing to consider pretty much anything here as I am just in the initial (re)design phase of this. Any and all thoughts are welcome including solution suggestions, general patterns to look into, existing tools even.

Thanks.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T20:47:19+00:00Added an answer on May 17, 2026 at 8:47 pm

    This is a partial answer, assuming that your patterns you are trying to match against are all either constant strings with no wildcards in them, or a sequence of strings separated by wilcards “*” that can match any string.

    This problem has been studied quite a bit in the context of implementing network-based and host-based intrusion detection systems, where you have a bunch of patterns you are looking for in network traffic, where each pattern might be a sign of an intruder sending attack traffic at you.

    In the special case where there are no wildcards at all in the patterns, and your set of patterns is changing infrequently, so you can afford to spend some time doing some precomputation of data structures when they change, a well-known way to do this is the Aho-Corasick algorithm:

    http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_matching_algorithm

    If you then want to generalize to allow wildcards, the following ideas might not have good worst-case performance, but would likely perform well in practice. Break up patterns that have wildcards in them into the “constant string” parts, e.g. break up the pattern “developer..com” into “developer.” and “.com”. Put those two strings in the list of ones you are searching for separately. Only if a URL coming in matches both developer. and .com would you then do some more post-processing to make sure it had them both in the desired order (as opposed to in the opposite order, like “a.com.developer.foo” would, and should thus not match the pattern “developer..com”).

    For large sets of patterns, Aho-Corasick can require lots of memory to store the state-machine that it represents. There have been other similar methods designed later to improve on it. For example, Google for the paper title “Advanced Algorithms for Fast and Scalable
    Deep Packet Inspection” by Kumar, Turner, and Williams.

    I am aware other methods of solving this, too, which are patented by Cisco Systems. If there is any chance your company would license these methods, or already has some kind of bulk cross-licensing agreement with Cisco, I’d be happy to tell you more about those.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

This is a design question and good practice question. How do you structure your
I've been thinking about this object oriented design question for a while now and
Update: Please read this question in the context of design principles, elegance, expression of
This is a somewhat bizarre question. My objectives are to understand the language design
Perhaps my question is similar in nature to this one: Do you use design
What's the term for this design? object.method1().method2().method3() ..when all methods return *this? I found
I'm looking for a way to implement this design in wxPython on Linux... I
Suppose I have a design like this: Object GUI has two objects: object aManager
Maybe the need to do this is a 'design smell' but thinking about another
I am currently stuck in the design of this solution. The data layer design

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.