Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8965333
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T16:46:53+00:00 2026-06-15T16:46:53+00:00

I’m working since short time with Hadoop and trying to implement a join in

  • 0

I’m working since short time with Hadoop and trying to implement a join in Java. It doesn’t matter if Map-Side or Reduce-Side. I took Reduce-Side join since it was supposed to be easier to implement. I know that Java is not the best choice for joins, aggregations etc. and should better pick Hive or Pig which I have done already. However I’m working on a research project and I have to use all of those 3 languages in order to deliver a comparison.

Anyway, I have two input files with different structure. One is key|value and the other one is key|value1;value2;value3;value4. One record from each input file looks like following:

  • Input1: 1;2010-01-10T00:00:01
  • Input2: 1;23;Blue;2010-01-11T00:00:01;9999-12-31T23:59:59

I followed the example in the Hadoop Definitve Guide book, but it didn’t work for me. I’m posting my java files here, so you can see what’s wrong.

public class LookupReducer extends Reducer<TextPair,Text,Text,Text> {


private String result = "";
private String msisdn;
private String attribute, product;
private long trans_dt_long, start_dt_long, end_dt_long; 
private String trans_dt, start_dt, end_dt; 

@Override
public void reduce(TextPair key, Iterable<Text> values, Context context) 
        throws IOException, InterruptedException {

     context.progress();
    //value without key to remember

    Iterator<Text> iter =  values.iterator();

 for (Text val : values) {

Text recordNoKey = val;     //new Text(iter.next());

String valSplitted[] = recordNoKey.toString().split(";"); 

//if the input is coming from CDR set corresponding values

    if(key.getSecond().toString().equals(CDR.CDR_TAG))
    {
        trans_dt = recordNoKey.toString();
        trans_dt_long = dateToLong(recordNoKey.toString());
    }
  //if the input is coming from Attributes set corresponding values
    else if(key.getSecond().toString().equals(Attribute.ATT_TAG))
    {
        attribute = valSplitted[0];
        product = valSplitted[1];
        start_dt = valSplitted[2];
        start_dt_long = dateToLong(valSplitted[2]);
        end_dt = valSplitted[3];
        end_dt_long = dateToLong(valSplitted[3]);;
    }

        Text record = val;  //iter.next();
        //System.out.println("RECORD: " + record);
        Text outValue = new Text(recordNoKey.toString() + ";" + record.toString());     

if(start_dt_long < trans_dt_long && trans_dt_long < end_dt_long)
       {
    //concat output columns
        result = attribute + ";" + product + ";" + trans_dt;    

    context.write(key.getFirst(), new Text(result));
    System.out.println("KEY: " + key);
        }
    }
}

private static long dateToLong(String date){
    DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    Date parsedDate = null;
    try {
        parsedDate = formatter.parse(date);
    } catch (ParseException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    long dateInLong = parsedDate.getTime();

    return dateInLong;

}

public static class TextPair implements WritableComparable<TextPair> {

    private Text first;
    private Text second;

    public TextPair(){
        set(new Text(), new Text());
    }

    public TextPair(String first, String second){
        set(new Text(first), new Text(second));
    }

    public TextPair(Text first, Text second){
        set(first, second);
    }

    public void set(Text first, Text second){
        this.first = first;
        this.second = second;
    }

    public Text getFirst() {
        return first;
    }

    public void setFirst(Text first) {
        this.first = first;
    }

    public Text getSecond() {
        return second;
    }

    public void setSecond(Text second) {
        this.second = second;
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        // TODO Auto-generated method stub
        first.readFields(in);
        second.readFields(in);
    }

    @Override
    public void write(DataOutput out) throws IOException {
        // TODO Auto-generated method stub
        first.write(out);
        second.write(out);
    }

    @Override
    public int hashCode(){
        return first.hashCode() * 163 + second.hashCode();
    }

    @Override
    public boolean equals(Object o){
        if(o instanceof TextPair)
        {
            TextPair tp = (TextPair) o;
            return first.equals(tp.first) && second.equals(tp.second);
        }
        return false;
    }

    @Override
    public String toString(){
        return first + ";" + second;
    }

    @Override
    public int compareTo(TextPair tp) {
        // TODO Auto-generated method stub
        int cmp = first.compareTo(tp.first);
        if(cmp != 0)
            return cmp;
        return second.compareTo(tp.second);
    }


    public static class FirstComparator extends WritableComparator {

        protected FirstComparator(){
            super(TextPair.class, true);
        }

        @Override
        public int compare(WritableComparable comp1, WritableComparable comp2){
            TextPair pair1 = (TextPair) comp1;
            TextPair pair2 = (TextPair) comp2;
            int cmp = pair1.getFirst().compareTo(pair2.getFirst());

            if(cmp != 0)
                return cmp;

            return -pair1.getSecond().compareTo(pair2.getSecond());
        }
    }

    public static class GroupComparator extends WritableComparator {
        protected GroupComparator() 
        {
            super(TextPair.class, true);
        }

        @Override
        public int compare(WritableComparable comp1, WritableComparable comp2)
        {
            TextPair pair1 =  (TextPair) comp1;
            TextPair pair2 =  (TextPair) comp2;

            return pair1.compareTo(pair2);
        }
    }

}

}

public class Joiner  extends Configured implements Tool {

public static final String DATA_SEPERATOR =";";                                      //Define the symbol for seperating the output data
public static final String NUMBER_OF_REDUCER = "1";                                  //Define the number of the used reducer jobs
public static final String COMPRESS_MAP_OUTPUT = "true";                             //if the output from the mapping process should be compressed, set COMPRESS_MAP_OUTPUT = "true" (if not set it to "false")
public static final String 
            USED_COMPRESSION_CODEC = "org.apache.hadoop.io.compress.SnappyCodec";    //set the used codec for the data compression
public static final boolean JOB_RUNNING_LOCAL = true;                                //if you run the Hadoop job on your local machine, you have to set JOB_RUNNING_LOCAL = true
                                                                                     //if you run the Hadoop job on the Telefonica Cloud, you have to set JOB_RUNNING_LOCAL = false
public static final String OUTPUT_PATH = "/home/hduser"; //set the folder, where the output is saved. Only needed, if JOB_RUNNING_LOCAL = false



public static class KeyPartitioner extends Partitioner<TextPair, Text> {
    @Override
    public int getPartition(/*[*/TextPair key/*]*/, Text value, int numPartitions) {
        System.out.println("numPartitions: " + numPartitions);
          return (/*[*/key.getFirst().hashCode()/*]*/ & Integer.MAX_VALUE) % numPartitions;
        }
}

private static Configuration hadoopconfig() {
    Configuration conf = new Configuration();

    conf.set("mapred.textoutputformat.separator", DATA_SEPERATOR);
    conf.set("mapred.compress.map.output", COMPRESS_MAP_OUTPUT);
    //conf.set("mapred.map.output.compression.codec", USED_COMPRESSION_CODEC);
    conf.set("mapred.reduce.tasks", NUMBER_OF_REDUCER);
    return conf;
}

@Override
public int run(String[] args) throws Exception {
    // TODO Auto-generated method stub
    if ((args.length != 3) && (JOB_RUNNING_LOCAL)) {

        System.err.println("Usage: Lookup <CDR-inputPath> <Attribute-inputPath> <outputPath>");
        System.exit(2);
    }

    //starting the Hadoop job
    Configuration conf = hadoopconfig();
    Job job = new Job(conf, "Join cdrs and attributes");
    job.setJarByClass(Joiner.class);

    MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, CDRMapper.class);
    MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, AttributeMapper.class);
    //FileInputFormat.addInputPath(job, new Path(otherArgs[0]));    //expecting a folder instead of a file

    if(JOB_RUNNING_LOCAL)
        FileOutputFormat.setOutputPath(job, new Path(args[2]));
    else
        FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));


    job.setPartitionerClass(KeyPartitioner.class);
    job.setGroupingComparatorClass(TextPair.FirstComparator.class);
    job.setReducerClass(LookupReducer.class);

    job.setMapOutputKeyClass(TextPair.class);
    job.setMapOutputValueClass(Text.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    return job.waitForCompletion(true) ? 0 : 1;
}

 public static void main(String[] args) throws Exception {

     int exitCode = ToolRunner.run(new Joiner(), args);
     System.exit(exitCode);

 }
}

public class Attribute {

public static final String ATT_TAG = "1";


public static class AttributeMapper 
extends Mapper<LongWritable, Text, TextPair, Text>{

    private static Text values = new Text();
    //private Object output = new Text();

    @Override
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //partition the input line by the separator semicolon   
        String[] attributes = value.toString().split(";");
        String valuesInString = "";

        if(attributes.length != 5)
            System.err.println("Input column number not correct. Expected 5, provided " + attributes.length
                    + "\n" + "Check the input file");
        if(attributes.length == 5)
        {
            //setting the values with the input values read above
            valuesInString = attributes[1] + ";" + attributes[2] + ";" + attributes[3] + ";" + attributes[4];
            values.set(valuesInString);
        //writing out the key and value pair
        context.write( new TextPair(new Text(String.valueOf(attributes[0])), new Text(ATT_TAG)), values);
            }
    }
}   

}

public class CDR    {


public static final String CDR_TAG = "0";

 public static class CDRMapper 
    extends Mapper<LongWritable, Text, TextPair, Text>{

        private static Text values = new Text();
        private Object output = new Text();

    @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //partition the input line by the separator semicolon   
    String[] cdr = value.toString().split(";");

    //setting the values with the input values read above
    values.set(cdr[1]);
    //output = CDR_TAG + cdr[1];

    //writing out the key and value pair
    context.write( new TextPair(new Text(String.valueOf(cdr[0])), new Text(CDR_TAG)), values);
        }


     }

}

I would be glad if anyone could at least post a link for a tutorial or a simple example where such a join functionality is implemented. I searched a lot, but either the code was not complete or there was not enough explanation.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T16:46:54+00:00Added an answer on June 15, 2026 at 4:46 pm

    To be honest, I have no idea what your code is trying to do, but that’s probably because I’d do it in a different way and not familiar with the API’s you’re using.

    I would start from scratch as follows:

    • Create a mapper to read one of the files. For each line read, write a key value pair to the context. The key is a Text created from the key and the value is another Text created by concatenating a “1” with the entire input record.
    • Create another mapper for the other file. This mapper acts just like the first mapper, but the value is a Text created by concatenating a “2” with the entire input record.
    • Write a reducer to do the join. The reduce() method will get all records written for a specific key. You can tell which input file (and therefore the data format for the record) by looking to see whether the value starts with a “1” or a “2”. Once you know whether or not you have one, the other or both record types, you can write whatever logic you need to merge the data from the one or two records.

    By the way, you use the MultipleInputs class to configure more than one mapper in your job/driver class.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I am trying to understand how to use SyndicationItem to display feed which is
Basically, what I'm trying to create is a page of div tags, each has
I am trying to render a haml file in a javascript response like so:
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I'm trying to select an H1 element which is the second-child in its group
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I have been unable to fix a problem with Java Unicode and encoding. The
I'm trying to use string.replace('’','') to replace the dreaded weird single-quote character: ’ (aka

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.