I have a customer that has the following data structure… for each patient, there may be multiple samples, and each sample may, after processing, have 4 million data objects. The max number of samples per patient is 20. So a single patient may end up with 80 million rows of data, and of course there will be many many hundreds of patients eventually.
In setting up a database to store the objects (which each contain about 30 fields of statistics and measurements) the challenge is pretty clear- how to manage this vast amount of data?
I was thinking that I would have one database, with a table for each sample- so each table may have at most 4 million records.
A colleague of mine had an interesting suggestion which was to take it one step further- create a new database per patient and then have a table per sample. His thinking was that having 1 log per patient, being able to move databases on a per patient basis, etc was good. I can’t disagree with him.
Is this reasonable? Is it a bad idea for some reason to have many databases?
Thoughts? Thank you!
While the idea is interesting from privacy and migration standpoint, it is NOT a good idea to have a single database per patient. Think about managing, backing up, having files for each patient database. I’m even not sure if DBMS can handle millions of databases at the same time in an instance or a server.
What I would do is, accept the volumetric data as facts of live and deal with it in the type of parameters and tables you choose. Let the DBMS worry about the schale of it. Make sure you have a deployment model allowing to scale-up and scale-out your tables. A table per entity, at least would be wise, so for patient, measurement, etc.
Just, do what you are good in as a developer and let the DBMS do what it is created for.