Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9078423
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T19:37:07+00:00 2026-06-16T19:37:07+00:00

A friend of mine was asked this problem in an interview. I would like

  • 0

A friend of mine was asked this problem in an interview. I would like to discuss this problem here

What can be the efficient implementation for this problem ?

A simple idea which comes to me is normal memqueue , using Memcache machines to scale several requests, with a consumer job running which will write things from memcache to DB.
and later on for the second part we can just run a sql query to find list of matching subscribers .

PROBLEM:-

Events get published to this system. Each event can be thought of as containing a fixed number (N) of string columns called C1, C2, … CN. Each event can thus be passed around as an array of Strings (C1 being the 0th element in the array, C2 the 1st and so on).

There are M subscribers – S1, … SM

Each subscriber registers a predicate that specifies what subset of the events it’s interested in. Each predicate can contain:

Equality clause on columns, for example: (C1 == “US”)
Conjunctions of such clauses, example: 
    (C1 == “IN”) && (C2 == “home.php”) 
    (C1 == “IN”) && (C2 == “search.php”) && (C3 == “nytimes.com”)

(In the above examples, C1 stands for the country code of an event and C2 stands for the web page of the site and C3 the referrer code.)

ie. – each predicate is a conjunction of some number of equality conditions. Note that the predicate does not necessarily have an equality clause for ALL columns (ie. – a predicate may not care about the value of some or all columns). (In the examples above: #a does not care about the columns C3, … CN).

We have to design and code a Dispatcher that can match incoming events to registered subscribers. The incoming event rate is in millions per second. The number of subscribers is in thousands. So this dispatcher has to be very efficient. In plain words:

When the system boots, all the subscribers register their predicates to the dispatcher
After this events start coming to the dispatcher
For each event, the dispatcher has to emit the id of the matching subscribers.

In terms of an interface specification, the following can be roughly spelt out (in Java):

Class Dispatcher {

    public Dispatcher(int N /* number of columns in each event – fixed up front */);

    public void registerSubscriber( String subscriberId /* assume no conflicts */,
                                    String predicate /* predicate for this subscriberid */);

    public List<String> findMatchingIds(String[] event /* assume each event has N Strings */);

}

Ie.: the dispatcher is constructed, then a bunch of registerSubscriber calls are made. After this we continuously invoke the method findMatchingIds() and the goal of this exercise is to make this function as efficient as possible.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T19:37:08+00:00Added an answer on June 16, 2026 at 7:37 pm

    As Hanno Binder implied, the problem is clearly set up to allow pre-processing the subscriptions to obtain an efficient lookup structure. Hanno says the lookup should be a map

    (N, K) -> set of subscribers who specified K in field N     
    (N, "") -> set of subscribers who omitted a predicate for field N
    

    When an event arrives, just look up all the applicable sets and find their intersection. A lookup failure returns the empty set. I’m only recapping Hanno’s fine answer to point out that a hash table is O(1) and perhaps faster in this application than a tree. On the other hand, intersecting trees can be faster, O(S + log N) where S is the intersection size. So it depends on the nature of the sets.

    Alternative

    Here is my alternative lookup structure, again created only once during preprocessing. Begin by compiling a map

    (N, K) -> unique token T (small integer)
    

    There is also a distinguished token 0 that stands for “don’t care.”

    Now every predicate can be thought of as a regular expression-like pattern with N tokens, either representing a specific event string key or “don’t care.”

    We can now build a decision tree in advance. You can also think of this tree is a Deterministic Finite Automaton (DFA) for recognizing the patterns. Edges are labeled with tokens, including “don’t care”. A don’t care edge is taken if no other edge matches. Accepting states contain the respective subscriber set.

    Processing an event starts with converting the keys to a token pattern. If this fails due to a missing map entry, there are no subscribers. Otherwise feed the pattern to the DFA. If the DFA consumes the pattern without crashing, the final state contains the subscriber set. Return this.

    For the example, we would have the map:

    (1, "IN") -> 1
    (2, "home.php") -> 2
    (2, "search.php") -> 3
    (3, "nytimes.com") -> 4
    

    For N=4, the DFA would look like this:

    o --1--> o --2--> o --0--> o --0--> o
              \
                -3--> o --4--> o --0--> o
    

    Note that since there are no subscribers who don’t care about e.g. C1, the starting state doesn’t have a don’t care transition. Any event without “IN” in C1 will cause a crash, and the null set will be properly returned.

    With only thousands of subscribers, the size of this DFA ought to be reasonable.

    Processing time here is of course O(N) and could be very fast in practice. For real speed, the preprocessing could generate and compile a nest of C switch statements. In this fashion you might actually get millions of events per second with a small number of processors.

    You might even be able to coax a standard tool like the flex scanner generator to do most of the work for you.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

A friend of mine was asked this question for an interview and couldn't solve
A friend of mine was asked, during a job interview, to write a program
A friend of mine was asked the following question today at interview for the
A friend of mine came to me with this strange behavior which i can't
A friend of mine asked me whether he can override a static variable in
A friend of mine asked me how can he translate text inside textarea from
A friend of mine, at work came to me and asked why he can't
Sorry for the double post: a friend of mine asked this on the mailing
A friend of mine asked for a simple program. Input: Coordinates of some points,
A friend of mine was asked the following question a Yahoo interview: Given a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.