I am using Nutch-1.4 for crawling websites. the issue i am facing in crawling is fetcher always aborts with N hung threads.
Entries in log file are,
INFO fetcher.Fetcher – -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
INFO fetcher.Fetcher – -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
INFO fetcher.Fetcher – -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
WARN fetcher.Fetcher – Aborting with 1 hung threads.
How to resolve this issue?
Some requests seem to hang, despite all intentions. This happens when the Fetcher threads don’t perform any activity for a long time. See line 932-936 here.
Steps to deal here:
wgetthose urls from the same machine.I think if u work of these things, u can get it fixed.
also read this and this.