I have multiple different keys generated in the following format:
“71 1 2”, “69 2 3”, “68 5 6”, etc.
But, I find that most of these pairs go to same reducers.
Even if I implement a custom partitioner, the getNumPartitioner method, in which we use, hash_val % numReducers, mostly returns values, which group to few reducers loading them, whereas, other reducers remain free.
,
According to my understanding, we can use WritableComparator to sort the keys but cannot control keys to go to different reducers.
Is there a way to improve load balancing? Pls help.
I am attaching some code below to make my explanation clear:
String a = "71 1 2";
String b = "72 1 1";
String c = "70 1 3";
int hash_a = a.hashCode();
int hash_b = b.hashCode();
int hash_c = c.hashCode();
int part_a = hash_a % 10;
int part_b = hash_b % 10;
int part_c = hash_c % 10;
System.out.println("hash a: "+hash_a+" part_a: "+part_a);
System.out.println("hash b: "+hash_b+" part_b: "+part_b);
System.out.println("hash c: "+hash_c+" part_c: "+part_c);
Output:
hash a: 1620857277 part_a: 7
hash b: 1621780797 part_b: 7
hash c: 1619933757 part_c: 7
As we see different keys tend to map to same reducer.
Please help! Thanks!
First of all, you cannot simply take the java modulus operation because sometimes hashcode might be negative and surely there isn’t something called a negative partition. So you might probably take an absolute value.
Second here is a strong hash function which I found on the internet. Instead of the normal 32 bit int, it generates a 64bit long. Again this suffers from the problem of negative partitions too, but you can correct that for yourself.