We have the following Solr (3.4) schema for indexing html/text documents: <fields> <field name=text

Question

0

Asked: May 26, 20262026-05-26T19:45:22+00:00 2026-05-26T19:45:22+00:00

We have the following Solr (3.4) schema for indexing html/text documents: <fields> <field name=text

0

We have the following Solr (3.4) schema for indexing html/text documents:

 <fields>

   <field name="text" type="text" indexed="true"
          stored="true" required="false" multiValued="false"
          omitNorms="false"/>
   <field name="title" type="text" indexed="true"
          stored="true" required="false" multiValued="false"
          omitNorms="false"/>
   <field name="created" type="date" indexed="true"
          stored="true" required="true" multiValued="false"
          omitNorms="false"/>
   <field name="modified" type="date" indexed="true"
          stored="true" required="false" multiValued="false"
          omitNorms="false"/>
   <field name="filesize" type="integer" indexed="true"
          stored="true" required="false" multiValued="false"
          omitNorms="false"/>
   <field name="mimetype" type="string" indexed="true"
          stored="true" required="false" multiValued="false"
          omitNorms="false"/>
   <field name="id" type="string" indexed="true"
          stored="true" required="true" multiValued="false"
          omitNorms="false"/>
   <field name="tag" type="string" indexed="true"
          stored="true" required="false" multiValued="false"
          omitNorms="false"/>
   <field name="relpath" type="string" indexed="true"
          stored="true" required="false" multiValued="false"
          omitNorms="false"/>

   <dynamicField name="tika_*" type="ignored" />

 </fields>

The configurations are auto-generated from templates from the solrinstance recipe for zc.buildout.

Now we need to import/index PDF/Office files etc. into Solr for fulltext indexing.

The generated requestHandler for the extraction is:

  <requestHandler name="/update/extract"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="fmap.text">tika_content</str>
      <str name="lowernames">false</str>
      <str name="uprefix">tika_</str>
    </lst>
  </requestHandler>

But after uploading a PDF file through curl I can not find any indication that it
has been index (no changes in the document stats etc.).

What is the trick here?

[Update]

I am using

curl “http://localhost:8983/solr/update/extract?literal.id=2&commit=true&fmap.content=text” -F “myfile=@1.pdf”

to upload a PDF file. Having adding fmap.content=text seems to do the desired mapping (overriding the generated configuration).

This seems to have solved the problem.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T19:45:23+00:00

fmap is basically field mapping for the content generated by tika.

Tika handler extracts the content of the document uploaded and assigns it to the field name content.
<str name="fmap.content">text</str> maps the content field to the text field defined in the schema.
As you have text field defined in the schema, this will work.

However, for <str name="fmap.text">tika_content</str> there is not field tika_content defined nor I think the text gets generated, so would not result in any matches.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

We have the following Solr (3.4) schema for indexing html/text documents: <fields> <field name=text

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply