I need some advice regarding how to organize my data for effective and fast text search.
Background
I have an application (in PHP) where the user can organize articles and dynamically create forms and fields for that purpose. Meaning that one article can for example have Type,Brand,Color attributes and another article can for example have Type,Material,Color,Content as attributes.
The user can basically create as many attributes he likes…
Then I need to be able to search and sort among these “unknown” attributes.
I also need to be able to read back all the attributes in the case the user want to edit the article.
My solution
My first idea (and so far only idea) is to encode all the attributes into a single TEXT field with a FULLTEXT index (needs to be MyISAM to work), like:
__Type="3",__Brand="Nokia",__Color="6"
__Type="2",__Material="7",Color="2",Content="MP3 Player,2 Apples, 1 book: Larry King"
The attributes would get a prefix and/or postfix to not be confused with the values in the attributes. Or serialize the attributes with JSON.
Then I can build a query based on the selected attributes like:
SELECT * FROM Articles a
WHERE Attribute LIKE '%__TYPE="2"%'
AND Attribute LIKE '%__Color="2"%'
If an attribute is empty, it will not be included and this makes it possible to include a search on all articles that has a specific attribute set, regardless of its value.
Problem
Problem or not, what I’m worried about is the search performance when the database is filled with thousands of articles.
Another problem would also be to search for a particular word inside a specific attribute, such as:
Content=”MP3 Player,2 Apples, 1 book: Larry King”
Let’s say I only want to get rows where the attribute Content have the phrase “Larry King” somewhere. I don’t think I can do that in the same SQL question without getting a match on all rows that has “Larry King” somewhere.
I’m open for any kind of suggestion/discussion regarding what tables,fields and relations I should create to achieve the goals explained.
Thank you.
If you are going to be often searching on the value of a particular attribute, why not make those attributes their own columns in the table? Or if you want a more flexible structure, make a second table like:
In this case, the
my_idfield is the primary key of your main table. So rather than serializing a string like:You would instead create some rows like:
And then you would formulate your search query like:
This does not precisely solve the second problem of your question, but it will perform far, far better than doing a full text search across thousands of rows. You can of course then add a
FULLTEXTindex on theattribute_valuefield as well to query it for text fragments like your “Larry King” example.