I’m trying to build a search for our internal support database – each support ticket consists of many emails and I’m trying to work out how best to index it:
- Should I create a document for each of the emails individually, or
- Should I concatenate all the emails for a ticket and create a document for each ticket.
When searching I want to return a list of tickets (rather than a list of emails grouped by ticket or anything like that)
Which is best?
If you want list of tickets in results then concatenate emails. Otherwise you need to maintain relations between emails and tickets. You can only do this with textual fields inside of documents. And this maybe slow. But such a relation is possible
If you use search together with relation database indexing emails one by one will be fine. You retrieve e-mails then read tickedId field from lucene document and then read Ticket with this Id from database.
Obviously indexing emails separately is more flexible solution. If in future you will need to retrieve per-email information you can do this. In all-emails-in-one solution you’ll have to reindex entire database.