I want to make a document search using python. Solr was no-go as Java

Question

0

Asked: May 24, 20262026-05-24T21:07:48+00:00 2026-05-24T21:07:48+00:00

I want to make a document search using python. Solr was no-go as Java

0

I want to make a document search using python. Solr was no-go as Java hosting was a constraint.

So whoosh seems the obvious option. But it seems not to natively index doc or pdf files (as Solr can). What is the way to make it deirectly index these files?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T21:07:49+00:00

Editorial Team

2026-05-24T21:07:49+00:00Added an answer on May 24, 2026 at 9:07 pm

Whoosh just needs the extracted text from those documents. While the Whoosh library wont do that extraction for you, there are Python libraries that will extract the text for you, like pdf miner, catdoc or antiword.

See these two discussions for more information:

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to make a document search using python. Solr was no-go as Java

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply