I am counting average of numbers using Hadoop/Mapreduce
with structure
guid banid countview
g1 b1 1
g1 b2 1
g1 b1 2
g1 b1 1
g2 b1 1
g2 b2 1
g2 b1 1
g2 b3 1
g3 b1 1
I want count average countview of each banid of guid?
(my mind is average=5/2 with guid g1 (2 is total numbers another banid: b1,b2))
So if i understand what you’re asking, the answer you’re looking for might look like:
First you need to break the problem down into your Map and Reduce stages. The objective is to group all the counts and banids for each “guid” in the reducer.
Mapper:
Output Key/Value types: Text / Text
The output key is probably going to be a Text Writable which will contain the guid. The Value will contain the banid and the count (ie b1:1). This will group all the banids and counts for each guid.
Reducer:
Output Key/Value types: Text / FloatWritable
You will now get a list of Text objects for each guid in the Key. Iterate through each Value object, spliting up the banid and the count. Create a set of the banids and sum the counts as you iterate. Once you’ve done this you should be able to calculate the average. Write out the average as a FloatWritable (or Text.. its up to you). The Key will be the same as the input key to the reduce.
This is a simple approach to dealing with a value that needs to contain multiple pieces of information. A more advanced approach would be to create your own Writable object that wraps a Text and VIntWritable object.