I have an entity called Project. Every project has group of members assigned to it. From every member I’m collecting some information, e.g. age, height etc.. Some of them are numerical-type, some text-type, some logical-type (boolean). There are several dozen of informations, which can be collected.
This, which information I’m collecting in particular project, defines the project itselves. In single project I used to collect several, like 3-5, informations. In some projects I can collect the same, or similar sets of informations. And this, what interests me, is making statistics of all projects, in which I have collected particular information.
The question is: what should be the architecture of the table(s) containing those informations? Having one big table with several dozen of columns with many null-values in each row doesn’t sound good, especially since I will have thousands, or even millions of those in my database. But having a table per project (and as many tables, as many projects), in which I’m collecting only the information, doesn’t sound good either, since to make a statistics from all of the projects would require using dynamic SQL (variable table names – depending on project) and iterating over hundreds of them. Also having a table per option, or even option-type (logical, text, boolean) doesn’t seem to be the proper way to do it.
I’m using PostgreSQL database. I know, that some databases have something like ANYTYPE (e.g. sql_variant in Microsoft SQL Server, or ANYDATA in Oracle), but PostgreSQL doesn’t, what makes me a little bit confused.
I’m quite sure there is a better solution for this, but I can’t figure it out. Could you, please, help me to find it?
Thanks in advance for every single response.
There’s a table inheritance feature built in postgresql which let you run queries on a table hierarchy.
Look there and there in the postgresql documentation for a good tutorial.
These explain how to build tables inheriting from one table: all the fields in the parent table are automatically included in the child tables, and all queries run on a parent table are run on child tables (but not the opposit, or on sibling tables) and results are concatenated, unless the special keyword ALONE is used to qualify the query (or clause maybe) as restricted to a single table.
You should be careful with constraints, because they do not cross table boundaries. In other words, if a constraint is set on a parent table, it will be restrained to that table alone, and child or sibling tables might contain duplicate of rows in the constrained table.