I have a table called User with two columns, one called visitorId and the other called friend which is a list of strings. I want to check whether the VisitorId is in the friendlist. Can anyone direct me as to how to access the table columns in a map function?
I’m not able to picture how data is output from a map function in hbase.
My code is as follows:
ublic class MapReduce {
static class Mapper1 extends TableMapper<ImmutableBytesWritable, Text> {
private int numRecords = 0;
private static final IntWritable one = new IntWritable(1);
private final IntWritable ONE = new IntWritable(1);
private Text text = new Text();
@Override
public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException {
//What should i do here??
ImmutableBytesWritable userKey = new ImmutableBytesWritable(row.get(), 0, Bytes.SIZEOF_INT);
context.write(userkey,One);
}
//context.write(text, ONE);
} catch (InterruptedException e) {
throw new IOException(e);
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
Job job = new Job(conf, "CheckVisitor");
job.setJarByClass(MapReduce.class);
Scan scan = new Scan();
Filter f = new RowFilter(CompareOp.EQUAL,new SubstringComparator("mId2"));
scan.setFilter(f);
scan.addFamily(Bytes.toBytes("visitor"));
scan.addFamily(Bytes.toBytes("friend"));
TableMapReduceUtil.initTableMapperJob("User", scan, Mapper1.class, ImmutableBytesWritable.class,Text.class, job);
}
}
So Result values instance would contain the full row from the scanner.
To get the appropriate columns from the Result I would do something like :-
VisitorIdVal = value.getColumnLatest(Bytes.toBytes(columnFamily1), Bytes.toBytes(“VisitorId”))
friendlistVal = value.getColumnLatest(Bytes.toBytes(columnFamily2), Bytes.toBytes(“friendlist”))
Here VisitorIdVal and friendlistVal are of the type keyValue http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/KeyValue.html, to get their values out you can do a Bytes.toString(VisitorIdVal.getValue())
Once you have extracted the values from columns you can check for “VisitorId” in “friendlist”