I’ve got a problem that keeps coming up with normalized databases and was looking for the best solution.
Suppose I’ve got an album information database. I want to setup the schema in a normalized fashion, so I setup two tables – albums, which has one listing for each album, and songs, which lists all songs contained by albums.
albums ------ aid name songs ----- aid sid length
This setup is good for storing the data in a normalized fashion, as an album can contain any number of songs. However, accessing the data in an intuitive manner has now become a lot more difficult. A query which only grabs the information on a single album is simple, but how do you grab multiple albums at once in a single query?
Thus far, the best answer I have come up with is grouping by aid and converting the songs information as arrays. For example, the result would look something like this:
aid, sids, lengths 1, [1, 2], [1:04, 5:45] 2, [3, 4, 5], [3:30, 4:30, 5:30]
When I want to work with the data, I have to then parse the sids and lengths, which seems a pointless exercise: I’m making the db concatenate a bunch of values just to separate them later.
My question: What is the best way to access a database with this sort of schema? Am I stuck with multiple arrays? Should I store the entirety of a song’s information in an object and then those songs into a single array, instead of having multiple arrays? Or is there a way of adding an arbitrary number of columns to the resultset (sort of an infinite-join), to accommodate N number of songs? I’m open to any ideas on how to best access the data.
I’m also concerned about efficiency, as these queries will be run often.
If it makes any difference, I’m using a PostgreSQL db along with a PHP front-end.
I have difficulty seeing your point. What exactly do you mean by ‘how do you grab multiple albums at once in a single query’? What exactly do you have difficulties with?
Intuitively I would say:
and
Depending on what you want to know/display. Either you query the database for aggregate information, or you calculate it yourself out of the query result #1 in your app.
Depending on how much data is cached in your app, and how long queries take the one strategy can be faster than the other. I would recommend querying the DB, though. DBs are made for this kind of stuff.