I am running nutch on hadoop multi cluster environment. Hadoop is throwing an error

Question

0

Editorial Team

Asked: June 14, 20262026-06-14T19:49:08+00:00 2026-06-14T19:49:08+00:00

I am running nutch on hadoop multi cluster environment. Hadoop is throwing an error

0

I am running nutch on hadoop multi cluster environment.

Hadoop is throwing an error when nutch is being executed using the following command

$ bin/hadoop jar /home/nutch/nutch/runtime/deploy/nutch-1.5.1.job org.apache.nutch.crawl.Crawl urls -dir urls -depth 1 -topN 5

Error:
Exception in thread “main” java.io.IOException: Not a file:
hdfs://master:54310/user/nutch/urls/crawldb
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:170)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:515)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753)
at com.bdc.dod.dashboard.BDCQueryStatsViewer.run(BDCQueryStatsViewer.java:829)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.bdc.dod.dashboard.BDCQueryStatsViewer.main(BDCQueryStatsViewer.java:796)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

I tried with possible ways of solving this and fixed all the issues like setting http.agent.name in /local/conf path etc. And I installed earlier and it was smooth.

Can anybody suggest a solution?

By the way, I followed link for installing and running.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T19:49:10+00:00

Editorial Team

2026-06-14T19:49:10+00:00Added an answer on June 14, 2026 at 7:49 pm

I could solve this issue. when copying files from local file system to HDFS destination filesystem, it used to be like this: bin/hadoop dfs -put ~/nutch/urls urls.

However it should be “bin/hadoop dfs -put ~/nutch/urls/* urls”, here urls/* will allow sub directories.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am running nutch on hadoop multi cluster environment. Hadoop is throwing an error

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply