i crawled site with apache nutch and indexed it to Apache Solr.i don’t know

Question

0

Asked: June 9, 20262026-06-09T17:02:38+00:00 2026-06-09T17:02:38+00:00

i crawled site with apache nutch and indexed it to Apache Solr.i don’t know

0

i crawled site with apache nutch and indexed it to Apache Solr.i don’t know how search strings between and html tags in a site with solr?
Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T17:02:39+00:00

Editorial Team

2026-06-09T17:02:39+00:00Added an answer on June 9, 2026 at 5:02 pm

The easiest way is to extract data from the HTML and index extracted data. You can use the HTMLStripCharFilterFactory to strip HTML from input stream.

<analyzer>
  <charFilter class="solr.HTMLStripCharFilterFactory"/>
  <tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i crawled site with apache nutch and indexed it to Apache Solr.i don’t know

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply