Can I get http respone header fields parsed with nutch?
Is it built-in capability that’s need to be configured?
I’ve looked the internet and I can’t find any info about this.
And also, if i do local file system crawling, is there a way to parse file’s header? (size, description etc fields?)
See line 144 here . You can see that http response headers can be obtained and you can use that info.
For second question:
For parsing different file types, there are plugins provided by nutch. You need to study the same for the specific file type and get going.