I have 3 simple classes:
public abstract class Container implements WritableComparable<Container> {} //empty
public class WeightedEdge extends Container { ... }
public class NodeWeightContainer extends Container { ... }
The Map phase was configured as such
JobConf createGraphPConf = new JobConf(new Configuration());
Job job = new Job(createGraphPConf);
...
createGraphPConf.setMapOutputValueClass(Container.class);
However I am receiving this error:
java.io.IOException: Type mismatch in value from map: expected org.hadoop.test.data.util.Container, recieved org.hadoop.test.data.WeightedEdge
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1018)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:591)
at org.hadoop.test.map.CreateGPMap.map(CreateGPMap.java:33)
at org.hadoop.test.map.CreateGPMap.map(CreateGPMap.java:19)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
Why I can’t return a subclass of a class that was defined in the configuration? Is there a way around it? The problem is that my Map phase has to emit two distinct object types.
You can not return a subclass of a class that was defined in the configuration because Hadoop explicitly checks class type specified in
setMapOutputValueClassand the type it receives from Mappers.It does so because it needs to serialize/deserialize objects you emit from mappers. When it performs deserialization it creates new object of type that is specified in
setMapOutputValueClasscall and then uses methods of WriteableComparable interface to fill newly created object with data.To be able to emit different object types you may define container non-abstract class and place actual object and its type identifier inside