I came across an algorithm where, the same file is loaded into the main memory for each mapper.
I assume that, we must use distributed cache to get the file, and read the file and load it into memory, for each mapper. When I implemented this, I found that the map was taking a long time to complete. I am assuming, it is because, the file is read every time from the local disc for each mapper value.
Am I correct in implementing it?
Is there any other suggestions?
Pls help! Thanks in advance!
You want to read from local disk in the Mapper setup() method. Use an instance variable to hold on to the reference.