I’m using redis to store hashes with ~100k records per hash. I want to implement filtering (faceting) the records within a given hash. Note a hash entry can belong to n filters.
After reading this and this it looks like I should:
- Implement a sorted SET per filter. The values within the SET correspond to the keys within a HASH.
- Retrieve the HASH keys from the given filter SET.
- Once I have the HASH keys from the SET fetch the corresponding entries from the HASH. This should give me all entries that belong to the filter.
Firstly is the above approach correct at a high level?
Assuming the approach is OK the bit I’m missing is what’s the most efficient implementation to retrieve the HASH entries? Am I right in thinking once I have the HASH keys I should then use a PIPELINE to queue multiple HGETALL commands passing through each HASH key? Is there a better approach?
My concern about using a PIPELINE is that I believe it will block all other clients while servicing the command. I’ll be paging the filtered results with 500 results per page. With multiple browser based clients performing filtering, not to mention the back end processes that populate the SETs and HASHes it sounds like there’s potential for a lot of contention if PIPELINE does block. Could anyone provide a view on this?
If it helps I’m using 2.2.4 redis, predis for the web clients and servicestack for the back end.
Thanks,
Paul
Redis is a lock-free non-blocking async server so there is no added contention when using pipelining. Redis hums along happily processing each operation as soon as it receives them so in practice can process multiple pipelined operations. In essence redis-server really doesn’t care if the operation is pipelined or not it just processes each operation as it receives them.
The benefit of pipelining is to reduce client latency where instead of waiting for a response from redis-server for each operation before sending the next one, the client can just pump all operations at once in a single write then read back all the responses in a single read.
An example of this in action is in my Redis mini StackOverflow clone each click makes a call to
ToQuestionResults()which because the operations are pipelined sends all operations on 1 Socket write call and reads the results in 1 Socket blocking read which is more efficient instead of a blocking read per call:https://github.com/ServiceStack/ServiceStack.Examples/blob/master/src/RedisStackOverflow/RedisStackOverflow.ServiceInterface/IRepository.cs#L180
This is not a valid concern and I wouldn’t over think how Redis works here, assume it’s doing it the most efficiently where Pipelining doesn’t block processing of other clients commands. Conceptually you can think that redis-server processes each command (pipelined or not) in FIFO order (i.e. no time is wasted in waiting/reading the entire pipeline).
You’re describing something closer to MULTI/EXEC (i.e. Redis Transactions) where all operations are done at once as soon as Redis server reads EXEC (i.e. EOF Transaction). This is not a problem either and redis-server still doesn’t waste any time waiting to receive your entire transaction, it just queues the partial commandset in a temporary queue until it receives the final EXEC which is then processed all at once.
This is how redis achieves atomicity by processing each command, one at a time, as soon as it receives them. Since there are no other threads, there is no thread context switching, no locks and no multi-threading issues. It basically achieves concurrency by processing each command really fast.
So in this case I would use Pipelining as it’s always a win, more so the more commands you pipeline (as you reduce the blocking read count).