I am using nutch 2.0, i’ve created a plugin for parsing html that implements Parser and works just fine.
The problem is that i need to “parse” also pages that generate redirects (301,300), for getting the url and the http code.My plugin ignores the redirected pages.
Any ideas how i can obtain this information, maybe with other extension point?
I’ve implemented the Protocol extension point and now i can save on database the redirects and loadtimes.