I’ve written a code which does something similar to SQL GroupBy.
The dataset I took is here:
250788681419,20090906,200937,200909,619,SUNDAY,WEEKEND,ON-NET,MORNING,OUTGOING,VOICE,25078,PAY_AS_YOU_GO_PER_SECOND_PSB,SUCCESSFUL-RELEASEDBYSERVICE,17,0,1,21.25,635-10-112-30455
public class MyMap extends Mapper<LongWritable, Text, Text, DoubleWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException
{
String line = value.toString();
String[] attribute=line.split(",");
double rs=Double.parseDouble(attribute[17]);
String comb=new String();
comb=attribute[5].concat(attribute[8].concat(attribute[10]));
context.write(new Text(comb),new DoubleWritable (rs));
}
}
public class MyReduce extends Reducer<Text, DoubleWritable, Text, DoubleWritable> {
protected void reduce(Text key, Iterator<DoubleWritable> values, Context context)
throws IOException, InterruptedException {
double sum = 0;
Iterator<DoubleWritable> iter=values.iterator();
while (iter.hasNext())
{
double val=iter.next().get();
sum = sum+ val;
}
context.write(key, new DoubleWritable(sum));
};
}
In the Mapper, as its value sends the 17th argument to the reducer to sum it. Now I also want to sum the 14th argument how do i send it to the reducer?
If your data types are the same, then creating an ArrayWritable class should work for this. The class should resemble:
Your mapper class then looks like:
In your reducer you should now be able to iterate over the values of the DblArrayWritable.
Based on your sample data however it looks like they may be separate types. You may be able to implement an ObjectArrayWritable class that would do the trick, but I’m not certain of this and I can’t see much to support it. If it works the class would be:
You could handle this by simply concatenating the values and passing them as Text to the reducer which would then split them again.
Another option is to implement your own Writable class. Here’s a sample of how that could work: