I’m configuring Jackrabbit 2.3.6 and I need to index binary files (PDF, ODT). So

Question

0

Asked: June 1, 20262026-06-01T23:03:57+00:00 2026-06-01T23:03:57+00:00

I’m configuring Jackrabbit 2.3.6 and I need to index binary files (PDF, ODT). So

0

I’m configuring Jackrabbit 2.3.6 and I need to index binary files (PDF,
ODT). So I’ve configured SearchIndex in repository.xml according to
http://wiki.apache.org/jackrabbit/Search. But when I insert file into repository and try to full-text
search, no results are returned.

Then I noticed warning in logs:

SearchIndex.java:2087 The textFilterClasses configuration parameter has been deprecated, and the configured value will be ignored: org.apache.jackrabbit.extractor.PlainTextExtractor,org.apache.jackrabbit.extractor.PdfTextExtractor,org.apache.jackrabbit.extractor.OpenOfficeTextExtractor

How do I have to configure SearchIndex to index binary data? Now I am
doing it like this, which is deprecated and didn’t work according to aforementioned warning:

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
    <param name="path" value="${rep.home}/repository/index"/>
    <param name="textFilterClasses"value="org.apache.jackrabbit.extractor.PdfTextExtractor,org.apache.jackrabbit.extractor.OpenOfficeTextExtractor"/>
    <param name="supportHighlighting" value="true"/>
</SearchIndex>

Thanks for replies.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T23:03:58+00:00

This is the answer to similar question from Mark Herman from Jackrabbit Users mailing list:

I’m not an expert but what I do know that JR uses Tika to extract text, and
it determines how based on the jcr:mimeType property. If you don’t supply
mimetype, then it won’t know how to extract it (although I wouldn’t
recommend that as a practice). I believe there is a way to supply JR with a
Tika config that might give you what you want. EDIT: There isn’t. It’s hardcoded.

Additionally you can specify a indexing config in the repository/workspace
xml files that you can set some rules on what gets indexed and how by
lucene.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m configuring Jackrabbit 2.3.6 and I need to index binary files (PDF, ODT). So

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply