I am using Nutch-1.4 for crawling websites. the issue i am facing in crawling

Question

0

Asked: June 2, 20262026-06-02T20:02:35+00:00 2026-06-02T20:02:35+00:00

I am using Nutch-1.4 for crawling websites. the issue i am facing in crawling

0

I am using Nutch-1.4 for crawling websites. the issue i am facing in crawling is fetcher always aborts with N hung threads.
Entries in log file are,

INFO fetcher.Fetcher – -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
INFO fetcher.Fetcher – -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
INFO fetcher.Fetcher – -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
WARN fetcher.Fetcher – Aborting with 1 hung threads.

How to resolve this issue?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T20:02:36+00:00

Some requests seem to hang, despite all intentions. This happens when the Fetcher threads don’t perform any activity for a long time. See line 932-936 here.

Steps to deal here:

check what urls were been crawled just before this message was logged in log file. (see fetching… statements in the log).
Are those urls taking lot of time to load ? (try to wget those urls from the same machine.
Is the content of those pages big ? (check their size))
The timeout value is typically 600 seconds. Increase the value of configuration mapred.task.timeout in mapred-site.xml of hadoop configuration. (For local mode, simply add the value in nutch-site.xml with larger value)
Are you performing any operation (say parsing) which is taking really lot of time ? Is the application hanging somewhere ?

I think if u work of these things, u can get it fixed.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using Nutch-1.4 for crawling websites. the issue i am facing in crawling

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply