How well an idea are multi-valued attributes in a relational database when they are to be referred extensively?
Let me give you an example to show what I mean. Suppose I have the following table:
UserID Attribute1
User1 a,b,c
User2 x,y,z
User3 a,x,y
User4 c,b,z
[a,b,c,x,y,z are to be strings]
There is another user User5 to whom I have to make some suggestions about other users based on whether his Attribute1 matches any one of other 4 users or not.
[In graph databases, the task could have been much easier as I could have created multiple nodes from the respective users using the same relationship.]
Now, this table is just a micro-level abstraction of what an actual database will look like. The number of rows in a table may run into hundreds of thousands, if not millions. Also, the multiple values may actually be a lot more than 3. Apart from this, the database can be under heavy load, and in that situation, there may be some issues.
So, are multi-valued attributes helpful in such cases? Or is there any better way of doing the same? One obvious way I can think of is to store it as:
UserID Attribute1
User1 a
User1 b
User1 c
User2 x
User2 y
User2 z
User3 a
User3 x
User3 y
User4 c
User4 b
User4 z
Any faster way of dealing such situations in databases? Or are there any built-in features of modern-day databases to exploit?
Having multiple values in a field is only useful if the data is dead weight in the database, i.e. if you only read the field out of the database and process it afterwards.
As soon as you want to use the values in the field in a query, you will take a huge performance hit from having to parse the value to compare it. If you put the values in separate records as in your second example, so that you can add an index on it, it’s not unrealistic that the query will be 10 000 times faster.
Having a million records in a table is not a problem. We have some tables that have over 100 million records in them.