What would be the impact of re-indexing the same value across multiple fields in a lucene index?
The idea is that someone’s first name is a part of their name and their general details. So I would want to index that value into multiple fields. Ted Bloggs I might index as follows:
Field | Value
-------------|---------
firstName | Ted
lastName | Blogs
name | Ted
name | Bloggs
general | Ted
general | Bloggs
all | Ted
all | Bloggs
By doing this I can easily form categories of fields however I’m worried it may have adverse performance and/or disk usage impacts.
Could anyone advise please
@aishwarya is right, but to expand on it a little bit more:
From the docs:
The term will be stored once per field, so if you repeat each term five times your storage will be five times bigger. However, the size of the term dic is logarithmic with respect to the size of the raw data, so I doubt you will have a problem.
The performance penalty will be non-existent (Lucene caches where each field starts) except insofar as having more data will force stuff out of memory. For most search infrastructures, you’ll probably have an index of under a few gb, which will easily fit in memory anyway.