I’m running multiple content/design separate websites from same middleware and I want to use Solr as a search engine. The sites differ in domain but not in internal structure (meaning, the actual database and datastructures are identical between the sites).
The question now is – is it better to store that site data in single Solr index and then separate it by a “site” field, or use a separate Solr core within a single JVM for each site?
What will provide the best performance (there are no cross-site queries)? What will provide the best recall and precision (I’m worried about loss of precision because of IDF factors – differences in content domains are quite large)?
I assume you are more worried about what happens when your sites grow. IMO, multiple cores seems a better choice.
Single large index: All updates and queries impinge upon a single point. When it starts getting slow, you must make a cluster by sharding or replication to store your large index. And it’s a single point of failure. Backing up the index will be tough.
Multiple cores: If one site is growing and dwarfing others, you can easily migrate it to a different server, ensuring that no servers are overloaded. Backing up individual sites will be relatively trivial.
Multiple cores will make your life simpler when you have un-busy sites. As your sites grows, you can put off clustering and performance tuning until later.