This is not exactly a question, but I am just looking for an opinion on this matter.
I am doing my first work with a company. They asked me to optimize their mysql db full text search.
Now, as soon as I saw the database structure, I literally changed face expression.
It is a car parts database, and they have like 1 table and 3 columns: ID, part_number, xml.
Am I just stupid not to understand this, or are they for putting ALL, and I say ALL the information about each different product inside an ENORMOUS xml text? I just don’t get it, and want a clarification. Could they have not put each different information regarding that product (say like: color, size, manufacturer, etc.) inside each different column? Or use an object based unrelational DB (like Mongo)? Is it a ‘normal’ practice to use this to enable a “full-text” search on an XML text to return the relevant item?
Please enlighten me, either I am really stupid and don’t get it, or that DB is a complete non-sense.
Thanks in advance.
The problem they get is that the number of attributes for different parts is a never ending list, some of them are common for many parts, some of them are not. If you attempt to put a column per attribute then you end up with thousands of columns – there are some technolgoies to deal with that in some RDBMS – sparse columns / tables.
An alternative is to normalize out the attributes onto an attribute table where you are storing part_id, attribute, value – which becomes a very large key-value table, and will rapidly grow in cardinality – and potentially some of these values are different types, and could be quite large. They will also duplicate, so you could normalize out again and then realize you have gone down a rabbit hole of painful performance and horrid scenarios where you have multiple value fields for different types or have to store a variant data type.
The final alternative is to FTS a field / store XML – which is what you found someone chose.
Someone made a choice, perhaps first time you may see it in an RDBMS, but probably not the last – whilst you are in a relational storage – consider how you would of done it differently to the original author (within the restriction of an RDBMS).
It’s always a nasty problem to store attribute information for items with disparate attributes – I’ve seen all three options taken by people in the past.