Here is the use case: I have a nutch crawldb (its a hadoop map

Question

0

Asked: June 4, 20262026-06-04T08:55:45+00:00 2026-06-04T08:55:45+00:00

Here is the use case: I have a nutch crawldb (its a hadoop map

0

Here is the use case:

I have a nutch crawldb (its a hadoop map file) containing data about urls which includes its status as visited and not-visited. I want to split it into 2 crawldb (map files) based on the status of the urls.

Till now i tried using MultipleOutputFormat but I read that it will work for sequence files or text files and NOT map files.

(FYI: i am using hadoop v20.2)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T08:55:46+00:00

Editorial Team

2026-06-04T08:55:46+00:00Added an answer on June 4, 2026 at 8:55 am

Look instead at MultipleOutputs, you’ll have to write a custom reducer to call the MultipleOutputs.getCollector() method for each type, there’s example usage in the javadocs.

In your job configuration:

 MultipleOutputs.addMultiNamedOutput(conf, "map",
   org.apache.hadoop.mapred.MapFileOutputFormat.class,
   LongWritable.class, Text.class);

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Here is the use case: I have a nutch crawldb (its a hadoop map

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply