I have 6 fields in a csv file:
- first is student name (
String) - others are student’s marks like subject 1 , subject 2 etc
I am writing mapreduce in java, splitting all fields with comma and sending student name in key and marks in value of map.
In reduce I’m processing them outputting student name in key and theire marks plus total, average, etc in value of reduce.
I think there may be an alternative, and more efficient way to do this.
Has anyone got an idea of a better way to do this these operations?
Are there any inbuilt functions of hadoop which can group by student name and can calculate total marks and average associated to thaty student?
You might want to have a look at Pig http://pig.apache.org/ which provides a simple language on top of Hadoop that lets you perform many standard tasks with much shorter code.