I am using Hadoop map-reduce program, where I want to represent part of the file as key. This I want to use to do for some analytics. However I found this has brought the performance. Can anyone please tell if there are any alternative to using large chunk of text. Can we encode it in any other format. I have also found by converting strings to byte or binary format. But still I am not able to store it in integer datatype. I tried converting it to BigInteger but in vain, since there are also collisions happening when reducing the text which are not similar. How to represent large chunk of text as key in mapper other than using Text datatype.
Share
How long can the part of your file be? How similar are the keys to each other? Have you considered using the MD5 hash (or similar) of the text as the key in your mapper?