Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 806365
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T00:13:54+00:00 2026-05-15T00:13:54+00:00

Given: 1 database per client (business customer) 5000 clients Clients have between 2 to

  • 0

Given:

  • 1 database per client (business customer)
  • 5000 clients
  • Clients have between 2 to 2000 users (avg is ~100 users/client)
  • 100k to 10 million records per database
  • Users need to search those records often (it’s the best way to navigate their data)

Possibly relevant info:

  • Several new clients each week (any time during business hours)
  • Multiple web servers and database servers (users can login via any web server)
  • Let’s stay agnostic of language or sql brand, since Lucene (and Solr) have a breadth of support

For Example:

Joel Spolsky said in Podcast #11 that his hosted web app product, FogBugz On-Demand, uses Lucene. He has thousands of on-demand clients. And each client gets their own database.

They use an index per client and store it in the client’s database. I’m not sure on the details. And I’m not sure if this is a serious mod to Lucene.

The Question:

How would you setup Lucene search so that each client can only search within its database?

How would you setup the index(es)?
Where do you store the index(es)?
Would you need to add a filter to all search queries?
If a client cancelled, how would you delete their (part of the) index? (this may be trivial–not sure yet)

Possible Solutions:

Make an index for each client (database)

  • Pro: Search is faster (than one-index-for-all method). Indices are relative to the size of the client’s data.
  • Con: I’m not sure what this entails, nor do I know if this is beyond Lucene’s scope.

Have a single, gigantic index with a database_name field. Always include database_name as a filter.

  • Pro: Not sure. Maybe good for tech support or billing dept to search all databases for info.
  • Con: Search is slower (than index-per-client method). Flawed security if query filter removed.

One last thing:
I would also accept an answer that uses Solr (the extension of Lucene). Perhaps it’s better suited for this problem. Not sure.

  • 1 1 Answer
  • 1 View
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T00:13:55+00:00Added an answer on May 15, 2026 at 12:13 am

    You summoned me from the FogBugz StackExchange. My name is Jude, I’m the current search architect for FogBugz.

    Here’s a rough outline of how the FogBugz On Demand search architecture is set up[1]:

    • For reasons related to data portability, security, etc., we keep all of our On Demand databases and indices separate.
    • While we do use Lucene (Lucene.NET, actually), we’ve modded its backend fairly substantially so that it can store its index entirely in the database. Additionally, a local cache is maintained on each webhost so that unnecessary database hits can be avoided whenever possible.
    • Our filters are almost entirely database-side (since they’re used by aspects of FogBugz outside of search), so our search parser separates queries into full-text and non-full-text components, executes the lookups, and combines the results. This is a little unfortunate, as it voids many useful optimizations that Lucene is capable of making.

    There are a few benefits to what we’ve done. Managing the accounts is quite simple, since client data and their index are stored in the same place. There are some negatives too, though, such as a set of really pesky edge case searches which underperform our minimum standards. Retrospectively, our search was cool and well done for its time. If I were to do it again, however, I would discourage this approach.

    Simply, unless your search domain is very special or you’re willing to dedicate a developer to blazingly fast search, you’re probably going to be outperformed by an excellent product like ElasticSearch, Solr, or Xapian.

    If I were doing this today, unless my search domain was extremely specific, I would probably use ElasticSearch, Solr, or Xapian for my database-backed full-text search solution. As for which, that depends on your auxiliary needs (platform, type of queries, extensibility, tolerance for one set of quirks over another, etc.)

    On the topic of one large index versus many(!) scattered indices: Both can work. I think the decision really lies with what kind of architecture you’re looking to build, and what kind of performance you need. You can be pretty flexible if you decide that a 2-second search response is reasonable, but once you start saying that anything over 200ms is unacceptable, your options start disappearing pretty quickly. While maintaining a single large search index for all of your clients can be vastly more efficient than handling lots of small indices, it’s not necessarily faster (as you pointed out). I personally feel that, in a secure environment, the benefit of keeping your client data separated is not to be underestimated. When your index gets corrupted, it won’t bring all search to a halt; silly little bugs won’t expose sensitive data; user accounts stay modular- it’s easier to extract a set of accounts and plop them onto a new server; etc.

    I’m not sure if that answered your question, but I hope that I at least satisfied your curiosity 🙂

    [1]: In 2013, FogBugz began powering its search and filtering capabilities with ElasticSearch. We like it.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.