I am developing a system which needs to allow users to be placed into groups. These groups can be freely created, edited, and deleted by other privileged users in the system. That part is easy; just create a group_users table that links users into groups. (If you’re a stickler for normalization then you may create a group table that just lists off groups and then have a group_users table that links them together — that’s fine too)
Here’s where it gets tricky. The client wants groups to also contain groups, to arbitrary depth and with arbitrary overlap (groups can be in multiple groups, and groups can contain multiple groups). That’s easy enough to store (with a group_groups table), but it’s difficult to query against without some sort extension like Oracle’s CONNECT BY.
This recursive hierarchy also needs to be retroactive — meaning that if group A contains group B, and group B is modified, then group A will be modified too — so I can’t cheat and just flatten out the structure. If you don’t believe me that it can’t simply be flattened, consider this situation. You have a group called “cool people” which contains users 1 and 2. Someone creates a group called “REALLY cool people” which contains user 3 and contains the group “cool people”. When I query against “REALLY cool people” I should conclude that users 1, 2, and 3 are in the group. Now say that someone decides that user 2 isn’t a cool person anymore and removes user 2 from “cool people”. After that point in time, “REALLY cool people” only contains users 1 and 3. If I had originally flattened out the structure, I wouldn’t know to remove user 2 from “REALLY cool people” when I removed him from “cool people”.
So a trivial flattening won’t work in this scenario. Other options I’ve considered:
- Doing the recursion in code.
- Too slow for complex groups, and also requires you to then do related joins in memory rather than on the database
- Flatten the structure out into
group_users_flattened, but also maintain agroup_groupstable. Create a trigger forgroup_users_flattenedon INSERT/UPDATE/DELETE that will go to thegroup_groupstable, find all groups that contain this group, and dynamically make the appropriate changes togroup_users_flattened.- I can imagine this working, but it seems convoluted and error-prone, and I have a feeling there’s a gotcha that I’m not seeing.
Are there other ideas that I haven’t considered?
See my answer to What is the most efficient/elegant way to parse a flat table into a tree?. I describe a design I call Closure Table.
In your case, you’d have your tables
UsersandGroupsandUserGroupMembersthat is the intersection table (many-to-many) between Users and Groups.Then you’d need another table to describe how groups are nested. Call it
SubgroupPathsfor instance. This records every path relating a given group to its subgroups.You may also need some permutations of compound indexes to support certain queries you’d run against this table.
This design allows you to have multiple hierarchies, so you could have group "cool people" be a descendant of each of its supergroups.
Now you can get the list of all users who should be considered cool people, regardless of whether they are members of cool people, slightly cool people, or REALLY cool people. We can even throw in a DISTINCT in case some users are associated with more than one of these groups.
I prefer Closure Table over the Nested Sets design suggested by other answers, because Closure Table supports referential integrity constraints, and there are some queries that are hard in Nested Sets but easier in Closure Table.
For more on Closure Table, check out my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Footnote to all this: be careful about violating the YAGNI principle.
I once implemented a database to store arbitrary-depth groups like this, and all the PHP code to display, report, and administer hierarchies. Also I had to clone hierarchical collections when they were used, because the hierarchy could be reorganized later, and we needed to keep historical data of how the elements in the hierarchy were used. It took weeks to code and test. And when it was all done, the user of the application never actually stored any hierarchy more one one level deep.
Some decision-makers will change their minds about the scope of requirements if you tell them how much work (i.e. budget) it will take to implement and test.