I need to create an match-finder system for some data set, as follows:
There is a set of objects, each identified by a string ObjectID.
Each object has exactly N properties Pi. Each property value is a string.
Database example for N = 3 (in real life N = 8).
ObjectID: P1 P2 P3 -------------------------------- APPLE: RED ROUND FRUIT ORANGE: ORANGE ROUND FRUIT CARROT: RED LONG VEGETABLE
The system must return sets of ObjectIDs, matching given query on object properties. In query user must specify all property values. Alternatively, for some or all properties in query user may specify a “wildcard” *, meaning that any property value would match criteria.
Example queries:
P1 P2 P3 => Result ------------------------------------ * ROUND FRUIT => APPLE, ORANGE RED LONG VEGETABLE => CARROT RED * * => CARROT, APPLE
All of this is trivially done with SQL.
The question is: is there a neat way to do that with Redis?
Note that I’m interested in Redis-based solutions specifically, for self-education purposes; other DBs are offtopic for this particular question.
Update: Trivial solution with explicit ObjectID lists for each Pi and application-side filtering does not look neat enough to me 🙂
What you are trying to do here is an inverted index.
For each column, have it map to a “set”. Then, you can intersect the sets to get the result.
So,
APPLE: RED ROUND FRUITwould map to the following inserts:Then, let’s say I want to query for
* ROUND FRUIT, I would do:This command is taking the intersection of the items in the
p2:ROUNDset and thep3:FRUITset. This will return all the items that areROUNDandFRUIT, not caring whatp1is.Some other examples:
My above answer is going to use some computation power because the intersection operation is
O(N*M). Here is a way of doing it that is more memory intensive, but will have faster retrieval because it effectively precomputes the indexes.For each combination of properties, make a key that stores a set:
So,
APPLE: RED ROUND FRUITwould map to the following inserts:Then, to query, you simply access the respective key. For example,
* ROUND FRUITwould simply beObviously, this doesn’t scale well at all in terms of memory when you have many dimensions, but it will be extremely snappy to retrieve results.