Suppose I have a tab delimited file containing user activity data formatted like this:

Question

0

Asked: June 11, 20262026-06-11T09:25:46+00:00 2026-06-11T09:25:46+00:00

Suppose I have a tab delimited file containing user activity data formatted like this:

0

Suppose I have a tab delimited file containing user activity data formatted like this:

timestamp  user_id  page_id  action_id

I want to write a hadoop job to count user actions on each page, so the output file should look like this:

user_id  page_id  number_of_actions

I need something like composite key here – it would contain user_id and page_id. Is there any generic way to do this with hadoop? I couldn’t find anything helpful. So far I’m emitting key like this in mapper:

context.write(new Text(user_id + "\t" + page_id), one);

It works, but I feel that it’s not the best solution.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T09:25:47+00:00

Just compose your own Writable. In your example a solution could look like this:

public class UserPageWritable implements WritableComparable<UserPageWritable> {

  private String userId;
  private String pageId;

  @Override
  public void readFields(DataInput in) throws IOException {
    userId = in.readUTF();
    pageId = in.readUTF();
  }

  @Override
  public void write(DataOutput out) throws IOException {
    out.writeUTF(userId);
    out.writeUTF(pageId);
  }

  @Override
  public int compareTo(UserPageWritable o) {
    return ComparisonChain.start().compare(userId, o.userId)
        .compare(pageId, o.pageId).result();
  }

}

Although I think your IDs could be a long, here you have the String version. Basically just the normal serialization over the Writable interface, note that it needs the default constructor so you should always provide one.

The compareTo logic tells obviously how to sort the dataset and also tells the reducer what elements are equal so they can be grouped.

ComparisionChain is a nice util of Guava.

Don’t forget to override equals and hashcode! The partitioner will determine the reducer by the hashcode of the key.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Suppose I have a tab delimited file containing user activity data formatted like this:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply