Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8698649
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T01:46:25+00:00 2026-06-13T01:46:25+00:00

I am new to Hadoop and Java, and I feel there is something obvious

  • 0

I am new to Hadoop and Java, and I feel there is something obvious I am just missing. I am using Hadoop 1.0.3 if that means anything.

My goal for using hadoop is to take a bunch of files and parse them one file at a time (as opposed to line by line). Each file will produce multiple key-values, but context to the other lines is important. The key and value are multi-value/composite, so I have implemented WritableCompare for the key and Writable for the value. Because the processing of each file take a bit of CPU, I want to save the output of the mapper, then run multiple reducers later on.

For the composite keys, I followed [http://stackoverflow.com/questions/12427090/hadoop-composite-key][1]

The problem is, the output is just Java object references as opposed to the composite key and value. Example:
LinkKeyWritable@bd2f9730 LinkValueWritable@8752408c

I am not sure if the problem is related to not reducing the data at all or

Here is my main class:

public static void main(String[] args) throws Exception {
  JobConf conf = new JobConf(Parser.class);
  conf.setJobName("raw_parser");

  conf.setOutputKeyClass(LinkKeyWritable.class);
  conf.setOutputValueClass(LinkValueWritable.class);

  conf.setMapperClass(RawMap.class);
  conf.setNumMapTasks(0);

  conf.setInputFormat(PerFileInputFormat.class);
  conf.setOutputFormat(TextOutputFormat.class);

  PerFileInputFormat.setInputPaths(conf, new Path(args[0]));
  FileOutputFormat.setOutputPath(conf, new Path(args[1]));

  JobClient.runJob(conf);
}

And my Mapper class:

public class RawMap extends MapReduceBase implements
Mapper {

    public void map(NullWritable key, Text value,
            OutputCollector<LinkKeyWritable, LinkValueWritable> output,
            Reporter reporter) throws IOException {
        String json = value.toString();
        SerpyReader reader = new SerpyReader(json);
        GoogleParser parser = new GoogleParser(reader);
        for (String page : reader.getPages()) {
            String content = reader.readPageContent(page);
            parser.addPage(content);
        }
        for (Link link : parser.getLinks()) {
            LinkKeyWritable linkKey = new LinkKeyWritable(link);
            LinkValueWritable linkValue = new LinkValueWritable(link);
            output.collect(linkKey, linkValue);
        }
    }
}

Link is basically a struct of various information that get’s split between LinkKeyWritable and LinkValueWritable

LinkKeyWritable:

public class LinkKeyWritable implements WritableComparable<LinkKeyWritable>{
    protected Link link;

    public LinkKeyWritable() {
        super();
        link = new Link();
    }

    public LinkKeyWritable(Link link) {
        super();
        this.link = link;
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        link.batchDay = in.readLong();
        link.source = in.readUTF();
        link.domain = in.readUTF();
        link.path = in.readUTF();
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeLong(link.batchDay);
        out.writeUTF(link.source);
        out.writeUTF(link.domain);
        out.writeUTF(link.path);
    }

    @Override
    public int compareTo(LinkKeyWritable o) {
        return ComparisonChain.start().
                compare(link.batchDay, o.link.batchDay).
                compare(link.domain, o.link.domain).
                compare(link.path, o.link.path).
                result();
    }

    @Override
    public int hashCode() {
        return Objects.hashCode(link.batchDay, link.source, link.domain, link.path);
    }

    @Override
    public boolean equals(final Object obj){
        if(obj instanceof LinkKeyWritable) {
            final LinkKeyWritable o = (LinkKeyWritable)obj;
            return Objects.equal(link.batchDay, o.link.batchDay)
                    && Objects.equal(link.source, o.link.source)
                    && Objects.equal(link.domain, o.link.domain)
                    && Objects.equal(link.path, o.link.path);
        }
        return false;
    }
}

LinkValueWritable:

public class LinkValueWritable implements Writable{
    protected Link link;

    public LinkValueWritable() {
        link = new Link();
    }

    public LinkValueWritable(Link link) {
        this.link = new Link();
        this.link.type = link.type;
        this.link.description = link.description;
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        link.type = in.readUTF();
        link.description = in.readUTF();
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(link.type);
        out.writeUTF(link.description);
    }

    @Override
    public int hashCode() {
        return Objects.hashCode(link.type, link.description);
    }

    @Override
    public boolean equals(final Object obj){
        if(obj instanceof LinkKeyWritable) {
            final LinkKeyWritable o = (LinkKeyWritable)obj;
            return Objects.equal(link.type, o.link.type)
                    && Objects.equal(link.description, o.link.description);
        }
        return false;
    }
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T01:46:27+00:00Added an answer on June 13, 2026 at 1:46 am

    I think the answer is in the implementation of the TextOutputFormat. Specifically, the LineRecordWriter’s writeObject method:

    /**
     * Write the object to the byte stream, handling Text as a special
     * case.
     * @param o the object to print
     * @throws IOException if the write throws, we pass it on
     */
    private void writeObject(Object o) throws IOException {
      if (o instanceof Text) {
        Text to = (Text) o;
        out.write(to.getBytes(), 0, to.getLength());
      } else {
        out.write(o.toString().getBytes(utf8));
      }
    }
    

    As you can see, if your key or value is not a Text object, it calls the toString method on it and writes that out. Since you’ve left toString unimplemented in your key and value, it’s using the Object class’s implementation, which is writing out the reference.

    I’d say that you should try writing an appropriate toString function or using a different OutputFormat.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am new to using Maven. I have Java files that have dependencies. Like
I'm using the new Hadoop API to write a sequence of map-reduce jobs. I
I am new to Hadoop and Hbase. I would like to know if there
I am new to Hive, MapReduce and Hadoop. I am using Putty to connect
We're using cdh3u4, Hadoop and HBase. I'm trying to run a unit test that
I'm new to java and trying to run a MR that uses HIPI: http://hipi.cs.virginia.edu/
I have a Class something like this in java for hadoop MapReduce public Class
I am new to hadoop and hadoop streaming so this error is probably something
I am new to Hadoop/PIG. I have a basic question. Do we have a
I am new to hadoop and trying to get a single node setup of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.