Input :
a,b,c,d,e
q,w,34,r,e
1,2,3,4,e
In mapper, I would grab all the values of the last field, and I want to emit (e,(a,b,c,d)) i.e. it emits (key, (rest of the fields from the line)).
Help appreciated.
Current code:
public static class Map extends Mapper<LongWritable, Text, Text, Text> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString(); // reads the input line by line
String[] attr = line.split(","); // extract each attribute values from the csv record
context.write(attr[argno-1],line); // gives error seems to like only integer? how to override this?
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
// further process , loads the chunk into 2d arraylist object for processing
}
public static void main(String[] args) throws Exception {
String line;
String arguements[];
Configuration conf = new Configuration();
// compute the total number of attributes in the file
FileReader infile = new FileReader(args[0]);
BufferedReader bufread = new BufferedReader(infile);
line = bufread.readLine();
arguements = line.split(","); // split the fields separated by comma
conf.setInt("argno", arguements.length); // saving that attribute value
Job job = new Job(conf, "nb");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(Map.class); /* The method setMapperClass(Class<? extends Mapper>) in the type Job is not applicable for the arguments (Class<Map>) */
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}`
Please note the errors (see comments) I get face.
So this is simple. First parse your string to get the key and pass the rest of the line as the value. Then use the identity reducer which will combine all the same key values as list together as your output. It should be in the same format.
So your map function will output:
e, (a,b,c,d,e)
e, (q,w,34,r,e)
e, (1,2,3,4,e)
Then after the identity reduce it should output:
e, {a,b,c,d,e; q,w,34,r,e; 1,2,3,4,e}