Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 264529
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T22:41:35+00:00 2026-05-11T22:41:35+00:00

I need to match input strings (URLs) against a large set (anywhere from 1k-250k)

  • 0

I need to match input strings (URLs) against a large set (anywhere from 1k-250k) of string rules with simple wildcard support.

Requirements for wildcard support are as follows:

Wildcard (*) can only substitute a “part” of a URL. That is fragments of a domain, path, and parameters. For example, “*.part.part/*/part?part=part&part=*”. The only exception to this rule is in the path area where “/*” should match anything after the slash.

Examples:

  • *.site.com/* — should match sub.site.com/home.html, sub2.site.com/path/home.html
  • sub.site.*/path/* — should match sub.site.com/path/home.html, sub.site.net/path/home.html, but not sub.site.com/home.html

Additional requirements:

  • Fast lookup (I realize “fast” is a relative term. Given the max 250k rules, still fall within < 1.5s if possible.)
  • Work within the scope of a modern desktop (e.g. not a server implementation)
  • Ability to return 0:n matches given a input string
  • Matches will have rule data attached to them

What is the best system/algorithm for such as task? I will be developing the solution in C++ with the rules themselves stored in a SQLite database.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-11T22:41:36+00:00Added an answer on May 11, 2026 at 10:41 pm

    If I’m not mistaken, you can take string rule and break it up into domain, path, and query pieces, just like it’s a URL. Then you can apply a standard wildcard matching algorithm with each of those pieces against the corresponding pieces from the URLs you want to test against. If all of the pieces match, the rule is a match.

    Example

    Rule: *.site.com/*
        domain => *.site.com
        path   => /*
        query  => [empty]
    
    URL: sub.site.com/path/home.html
        domain => sub.site.com
        path   => /path/home.html
        query  => [empty]
    
    Matching process:
        domain => *.site.com matches sub.site.com?     YES
        path   => /*         matches /path/home.html?  YES
        query  => [empty]    matches [empty]           YES
    
    Result: MATCH
    

    As you are storing the rules in a database I would store them already broken into those three pieces. And if you want uber-speed you could convert the *‘s to %‘s and then use the database’s native LIKE operation to do the matching for you. Then you’d just have a query like

    SELECT *
    FROM   ruleTable
    WHERE  @urlDomain LIKE ruleDomain
       AND @urlPath   LIKE rulePath
       AND @urlQuery  LIKE ruleQuery
    

    where @urlDomain, @urlPath, and @urlQuery are variables in a prepared statement. The query would return the rules that match a URL, or an empty result set if nothing matches.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have an input string and a pattern. I need to replace each match
I need to match a string holiding html using a regex to pull out
I need to match up two almost-the-same long freetext strings; i.e., to find index-to-index
I have a set of strings and I need to find all all of
I need a regex that will match strings of letters that do not contain
I need to be able to split an input String by commas, semi-colons or
I have many different formats of input strings, and I need to split the
I have a website where I need to parse date/time strings from receipts. These
Here at work, we often need to find a string from the list of
I need to find all the regex matches from a list of strings. For

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.