In a recent programming interview, I was asked an SQL question to which I gave what I thought was a reasonable answer, but my answer elicited strong disapproval from the dba, and I wasn’t able to figure out why.
Since then, I have thought about the problem some more, and I was unable to figure out what was so horrible about my answer, so I am seeking enlightenment here to find out the right way, or failing that, better ways of producing a report of libraries and the number of books in them from a database containing a table of libraries and a table of books.
I should note that I have changed the scenario a bit so that the wording is not identical to the interview question, but the task is the same.
Here is a minimal schema for the problem:
create table library (
id integer primary key,
name char(8)
);
create table book (
id integer primary key,
name char(8),
library_id integer,
foreign key (library_id) references library(id)
);
The task is to list names of libraries and the number of books in them for libraries with two or more books.
And, here is my proposed solution:
select
a.name as name,
b.nbooks as nbooks
from
library as a,
(
select
min(library_id) as library,
count(id) as nbooks
from
book
group by
library_id
) as b
where
( nbooks > 1 ) and (a.id = b.library)
;
On second thought, using an explicit inner join might have been better. Other than that, could you please point out to me the potential pitfalls (either in general or in relation to a particular database) and the correct way to generate this report?
Here is a simple way of doing this:
Your answer is technically ok. The DBA probably doesn’t care about certain stylistic things that others might (such as using “a” as the alias for library rather than “l”). The subquery is unnecessary, and the
min(library_id)sticks out as unnecessary. You can apply aggregate functions to the group by columns, but that is typically not done.The biggest problem — which the DBA may be responding to — is having the join condition in the
WHEREclause rather than in anONclause. This is dangerous, because if you leave it out or make what seems like an innocent modification, the query can become a CROSS JOIN instead of an INNER JOIN.