Hadoop(and Java) neophyte here. I needed some help with using MultipleTextOutputFormat to control the output filename in MapReduce.
Currently I am using it this way. And it seems to work fine. However what I am trying to change is the usage of the fields that get picked to determine the filename.
Instead of hardcoding them to field[0] or field[3](as is the case in the sample), I would like to pick this up(in some dynamic fashion) from say, JobConf as field[jobConf.get("id.offset")] or field[jobConf[get("date.offset")]. Does anyone here know how I could go about doing this (or something to this effect i.e. it doesn’t have to be JobConf per se)?
Any pointers/suggestions/tips et al. would be most appreciated. Thanks.
It depends on if your custom parameters differ based on a job or on key/value pairs.
You can get the JobConf object if you override the getRecordWriter() method. This is the method that calls generateFileNameForKeyValue() (check out the implementation in the class MultipleOutputFormat, in Hadoop source). You could just call super and then do whatever you want with the parameter you pass in the JobConf object.
If your parameters differ for different key/value pairs, you can send the parameter as part of the key or the value. And then override getActualKey() or getActualValue() in your MultipleTextOutputFormat class to get the actual key or value object that you want.
Hope this helps.