I’m trying to figure out how best to query against a schema that consists of one central table, plus a number of “attribute” tables (sorry, not sure of the best terminology here) that record one-to-many relationships. In the business layer, each of these tables corresponds to a collection that may contain zero or more elements.
Right now the code I’m looking at retrieves the data by getting a list of values from the master table, then looping over it and querying each of the “accessory” tables to populate these collections.
I’d like to try and get it down to a single query if I can. I tried using multiple LEFT JOINs. But this effectively joins against a cross product of the values in the accessory tables, which leads to an explosion of rows – especially when you add a few more joins. The table in question includes five such relationships, so the number of rows returned for each record is potentially enormous, and almost entirely composed of redundant data.
Here’s a smaller synthetic example of some tables, data, the query structure I’m using, and results:
Database structure & data:
create table Containers (
Id int not null primary key,
Name nvarchar(8) not null);
create table Containers_Animals (
Container int not null references Containers(Id),
Animal nvarchar(8) not null,
primary key (Container, Animal)
);
create table Containers_Foods (
Container int not null references Containers(Id),
Food nvarchar(8) not null,
primary key (Container, Food)
);
insert into Containers (Id, Name)
values (0, 'box'), (1, 'sack'), (2, 'bucket');
insert into Containers_Animals (Container, Animal)
values (1, 'monkey'), (2, 'dog'), (2, 'whale'), (2, 'lemur');
insert into Containers_Foods (Container, Food)
values (1, 'lime'), (2, 'bread'), (2, 'chips'), (2, 'apple'), (2, 'grape');
Coupled to a business object like this:
class Container {
public string Name;
public string[] Animals; // may be empty
public string[] Foods; // may be empty
}
And here’s the way I’m constructing the query against it:
select c.Name container, a.Animal animal, f.Food food from Containers c
left join Containers_Animals a on a.Container = c.Id
left join Containers_Foods f on f.Container = c.Id;
Which gives these results:
container animal food
--------- -------- --------
box NULL NULL
sack monkey lime
bucket dog apple
bucket dog bread
bucket dog chips
bucket dog grape
bucket lemur apple
bucket lemur bread
bucket lemur chips
bucket lemur grape
bucket whale apple
bucket whale bread
bucket whale chips
bucket whale grape
What I’d like to see instead is a number of rows equal to the maximum number of values associated with the root table on any of the relationships, with empty space filled in with NULLs. That would keep the number of rows returned way, way, way down, while still being easy to transform into objects. Something like this:
container animal food
--------- -------- --------
box NULL NULL
sack monkey lime
bucket dog apple
bucket lemur bread
bucket whale chips
bucket NULL grape
Can it be done?
Why not just return two data sets ordered by container, and then do a logical merge join on them in the client? What you’re asking for is going to make the DB engine do a lot more work, with a lot more complicated query, for (to me) small benefit.
It would look something like this. Use two left joins to make sure each data set has at least one instance of all container names, then loop through them simultaneously. Here is some rough pseudocode:
However, of course what you’re asking for is possible! Here are two ways.
Treat the inputs separately and join on their position:
Concatenate the inputs vertically and pivot them: