I was reading this article which mentions storing 1Million keys in redis will use 17GB of memory. However when switching to hashes chunking them at 1k each (ex: HSET "mediabucket:1155" "1155315" "939") allows them to store 1M in 5GB which is a pretty large savings.
I’ve read redis memory-optimization but I don’t quite understand the difference. It says HGETs are not quite O(1) but close enough and mentions more cpu usage when using hsets. I don’t understand why there would be more cpu usage (sure trading time for space. But how/what?). It mentions ‘encoding’ but not how they encode it.
It also mentions only string but I have no idea what only string means. Is it the hash field? Does it mean the hash field? I don’t see anything about it in HSET. What exactly would be encoded and why would the encoding be more efficient then using a SET?
How is it possible that
HSET "mediabucket:1155" "1155315" "939"
is more efficient than
SET "mediabucket:1155315" "939"?
There is less data in SET (1155315 an d 1155 is used rather than 1155315). I personally would try using binary keys however I don’t think that has to do with why HSETs are more efficient.
EDIT :
Cross posted on redis-db mailing list as well : https://groups.google.com/d/topic/redis-db/90K3UqciAx0/discussion
Small hash objects are encoded as ziplists depending on the values of hash-max-ziplist-entries and hash-max-ziplist-value parameters. This is simple data serialization.
A ziplist is defined as follows (extracted from Redis source code):
Each item from the hash object is represented as a key/value couple in the ziplist (2 successive entries). Both key and values can be stored as a simple string, or integer. This format is more compact in memory because it saves a lot of pointers (8 bytes each) that are required to implement a dynamic data structure (like a real hashtable).
The downside is HSET/HGET operations are actually O(N) when applied on ziplist. That’s why the ziplist must be kept small. When the ziplist data fit in the L1 CPU cache, the corresponding algorithms are fast enough despite of their linear complexity.
You may want to refer to the following links for more information:
Redis 10x more memory usage than data
Redis Data Structure Space Requirements
These answers refer to other data structures (like sets, list, or sorted sets), but it is exactly the same concept.