I am using PostgreSQL database.
Having a table named metadatavalue with structure is as below:
metadatavalue_id integer Primary Key Auto Increment
metadta_field_id integer Foreign Key
text_Value varchar
text_lang varchar
place integer
When anything gets submitted or added an item with almost 25 metadata fields
is created.
The metadatavalue table already contains around
One Hundred Fifty Thousand(150000) records.
I am implementing an auto complete feature for a field let say “Author”
which is stored as metadata_field_id in the table.
When I query the table on PgSQL prompt, it takes almost 1 or 2 seconds to return the result.
QUERY:
SELECT metadatavalue.text_value AS author, count(metadatavalue.text_value) AS count
FROM metadatavalue
WHERE (metadatavalue.metadata_field_id IN ( SELECT metadatafieldregistry.metadata_field_id
FROM metadatafieldregistry
WHERE metadatavalue.text_value LIKE 'Pra%' AND metadatafieldregistry.metadata_schema_id = 1 AND metadatafieldregistry.element::text = 'contributor'::text))
GROUP BY metadatavalue.text_value;
As its for auto complete the query might run 4-5 times when users enters value.
So, I am thinking to implement LUCENE based search.
In which,At First creating an index from back end and then on each new item
creation running a thread to index the new item.
I want to know that whether Apache Lucene would be better choice or
SQL can be optimized.
EDIT:
There is another table which contains metadata fields and it is used as Foreign Key (metadatafieldregistry.metadata_field_id) in metadatavalue table for the value.
I would say any database will handle at least a million rows gracefully if proper indexing is done, there is no reason for you to get into Lucene or Solr which will introduce you to new tasks like synchronization of your indexes with most current state of the DB.
Also, Lucene or Solr are very great for free text searching. This means if you search for “Bob Marley” on your Lucene “documents” then you will get all the document which has “Bob Marley”, “Marley Bob” or only “Bob” and only “Marley” or even “Bob…lot of text…Marley”. So using Lucene also depends on what kind of use cases you are trying to cover.
From the query you have shown I feel you will get good performance if you index
metadatavalue.text_valuemetadatafieldregistry.metadata_schema_idandmetadatafieldregistry.elementcolumns. Also try converting your query to a join rather then aninquery.Thanks