I have been intrigued by a problem on SQLZoo. It is a “greatest-n-per-group” problem. I would like to understand how the engine is operating.
A table called bbc contains the name, region of the world and population of each country:
bbc( name, region, population)
The given task is to select the most populous country of each region, showing its name, the region and population.
The solution provided is:
SELECT region, name, population FROM bbc x
WHERE population >= ALL
(SELECT population FROM bbc y
WHERE y.region=x.region
AND population>0)
1. Main Question. I am finding this a bit of a mind twister. I would like to understand how the engine processes this, because at first blush it seems there is some kind of co-dependence (x depending on y, and y depending on x). Does the engine follow some kind of recursion to produce the final selection? Or am I missing something, such that either x or y is actually fixed?
2. Secondary Question. Oddly, when I pull the “AND population>0” out of the parenthesis and leave it on its own at the bottom, one of the regions (Europe / Russia) goes missing from the 8 results. Why? I don’t understand that.
And indeed, when I try the query on the world database (available from the mySQL website on the same page as Sakila), the behavior is different:
With population > 0 out of the parentheses, I get 6 regions. Six is the right number in this database, because “SELECT continent FROM country GROUP BY continent” reveals seven continents, of which one is Antarctica, which includes 5 “countries”, all with a 0 population.
So that seems right.
SELECT continent, `name`, population FROM country X
WHERE population >= ALL
(SELECT population FROM country Y
WHERE Y.`Continent` = X.`Continent`)
AND population>0
On the other hand, when I pull “population > 0” back into the parentheses as on SQLZoo, I also get 5 countries with a zero (the countries “belonging to Antarctica”). It doesn’t matter if I specify x.population or y.population, I get zeroes.
continent name population
------------- -------------------------------------------- ------------
Antarctica Antarctica 0
Antarctica French Southern territories 0
Oceania Australia 18886000
South America Brazil 170115000
Antarctica Bouvet Island 0
Asia China 1277558000
Antarctica Heard Island and McDonald Islands 0
Africa Nigeria 111506000
Europe Russian Federation 146934000
Antarctica South Georgia and the South Sandwich Islands 0
North America United States 278357000
Very much looking for insights on these questions!
Wishing you all a beautiful week.
🙂
Notes:
-
For reference, the problem is number 3a on this page:
http://old.sqlzoo.net/1a.htm?answer=1 -
A thread mentioning the “greatest-n-per-group” problem for the same query:
MySQL world database Trying to avoid subquery -
The world database is available here: http://dev.mysql.com/doc/index-other.html
This isn’t recursion. See this from the MySQL docs. Their solution to the problem is equivalent to this
Slight changes (as suggested by ypercube above) work
This query
Returns a row. Not sure why population should be nullable, but didn’t take a good look at the rest of it. Otherwise, the query should work fine without the
>0Also, this is different from the greatest-n-per-group. In that problem you seek to find the top N items instead of just the top one.