I am working on a project, where we are trying to introduce a searchframework.

Question

0

Asked: May 21, 20262026-05-21T21:50:09+00:00 2026-05-21T21:50:09+00:00

I am working on a project, where we are trying to introduce a searchframework.

0

I am working on a project, where we are trying to introduce a searchframework. We are about to start development soon, we have only done some poc-work up till now. We are struggling with estimatesfor hardware. I am uncertain if our performance requirements can be met using a single server setup, or if we need to go for a replicated, or distrbuted solution.

Here are our main requirements

Search in semi-structured data
- Documents contains 15 fields all of whom should be searchable
- Mostly numeric id’s
- Dates
- Names
10+ millions documents in index
30-40 updates, in batches every minute
<100 ms response time searches with several boolean operators for 100 + queries pr minute

Questions

1) Is it feasible to get this performance on a singleserver setup?

2) If not what is an appropriate setup to meet the performance requirements.

3) We are considering several frameworks on top of Lucene, amongst them Solr and Zoie. What distributed architecture would be necessary to handle the descibed load and performance requirements.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T21:50:09+00:00

1) Is it feasible to get this performance on a singleserver setup?

Yes, I think so. But it’s a kind of “borderline” (I hope you know, what I mean)
What you need is enough RAM and CPU power. Finlay it depends on the size of “big” fileds, like fulltexte or so and the size of your database.

In comparison I use lucene with 1.2 million docs, 7 fileds, mostly short fileds (date,numbers,..) but also including one big textfield (500-5000 characters). The size of this mysql database (which is indexed by lucene) is 1-2 GB. The System runs on an small single CPU VMware Host with 4GB of RAM. The Fulltext-Search results returned in 100-400ms.
If you don’t have big textfields, your results will return faster. (depending on the kind of search -> for example facettet search)
For example: an facetet search on an char(255) Filed, returned in <70ms

Probably for your configuration an non visualized Hardware with lots of memory (>32GB) and >8 cores would be useful.

30-40 updates, in batches every minute

does it mean 30-40 new documents per minute? that’s no problem!
30-40 updates per minute with lots of new documents would be more challenging.
Additional you should optimize your index periodically (for example nightly)

3) We are considering several frameworks on top of Lucene, amongst them Solr and Zoie.

Solr is running as an tomcat application. Here you have to define for example the RAM (look above), which is assigned to your search engine.
There are different possibilities to split your index (for more performance or faster update), clustering is also possible.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am working on a project, where we are trying to introduce a searchframework.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply