I used nutch 1.4 and crawled a website. I got the website crawled successfully

Question

0

Asked: May 31, 20262026-05-31T19:32:31+00:00 2026-05-31T19:32:31+00:00

I used nutch 1.4 and crawled a website. I got the website crawled successfully

0

I used nutch 1.4 and crawled a website.
I got the website crawled successfully and all the pages were dumped into segments.
I merged all the segments to one segment and then i used readseg command to obtain a text version of all the crawled pages.
Now I need to find out, URL of page and the meta data stored in that page.
I don’t know which command to use or shall i need to do something different.

Have made a lot of efforts on google Some people said that you have to write a separate plugin for it. Can someone tell me please.

Thanks a lot 🙂 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T19:32:32+00:00

Editorial Team

2026-05-31T19:32:32+00:00Added an answer on May 31, 2026 at 7:32 pm

Finally, I am able to do it. Sharing in case someone else needs it.
You can use index-metatags plugin provided here:
http://wiki.apache.org/nutch/IndexMetatags

It will solve this problem
Cheers 🙂

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I used nutch 1.4 and crawled a website. I got the website crawled successfully

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply