So to set this up, I have a company in which we have users and a set of tags to describe these users.
Each user can have up to 5000 tags attached.
We have an engine that allows clients to pick certain tags to make a tag group. The engine has AND/Or functionality and Include/Exclude. Clients can create a tag group and our engine finds the total number of users that meet the logical requirements specified in the tag group. Basically this is just intersections, unions, and excludes so redis sets have been perfect.
To handle this, I store the data as such.
Tag1:[user1, user2,user3]
Tag2:[user1, user5, user6]
etc
From here, all of the bool logic is done using scripts.
However our customer base is expanding rapidly. Within a couple years, we will either need several 64GB redis servers or an alternative.
Here is my question. Are there any lightning fast DB options for doing intersect and union that are disk based? I have tried Postgres, but the performance is unacceptable. For example, a set compare on a 500k user set takes 1 second. In Postgres, I was seeing around 30 seconds, more if there are lots of tags in the tag group.
I have had DynamoDB recommended and a few others but just wanted some educated opinions before I dig too deep.
Thanks,
Dan
Redis is the best way to get fast intersections and unions. You can do a few things to limit the memory used by Redis :
Use IntSets
Internally, Redis uses a data structure
IntSets. This is a sorted array of integers. To find an integer in this set, the complexity is O(log N). An IntSet comes in three flavours – 16 bit, 32 bit and 64 bit.From a memory perspective, Int Sets are very optimal. If you are using sets and care about memory, you should make sure you are using Int Sets.
To take advantage of Int Sets, you need to do two things –
set-max-intset-entriesto a reasonable number. This would be the maximum number of users for a given tag. Note that increasing it beyond a point can actually degrade performance..Move User objects to another store
The sets only need user ids, they don’t need the entire user object. So, if memory becomes a constraint, you can also move User objects to another data store. Perhaps another Redis server, or even a relational database. This approach gives you best of both worlds.