Someone I know went to an interview and was given the following problem to solve. I’ve thought about it for a few hours and believe that it’s not possible to do without using some database-specific extensions or features from recent standards that don’t have wide support yet.
I don’t remember the story behind what is being represented, but it’s not relevant. In simple terms, you’re trying to represent chains of unique numbers:
chain 1: 1 -> 2 -> 3
chain 2: 42 -> 78
chain 3: 4
chain 4: 7 -> 8 -> 9
...
This information is already stored for you in the following table structure:
id | parent
---+-------
1 | NULL
2 | 1
3 | 2
42 | NULL
78 | 42
4 | NULL
7 | NULL
8 | 7
9 | 8
There could be millions of such chains and each chain can have an unlimited number of entries. The goal is to create a second table that would contain the exact same information, but with a third column that contains the starting point of the chain:
id | parent | start
---+--------+------
1 | NULL | 1
2 | 1 | 1
3 | 2 | 1
42 | NULL | 42
78 | 42 | 42
4 | NULL | 4
7 | NULL | 7
8 | 7 | 7
9 | 8 | 7
The claim (made by the interviewers) is that this can be achieved with just 2 SQL queries. The hint they provide is to first populate the destination table (I’ll call it dst) with the root elements, like so:
INSERT INTO dst SELECT id, parent, id FROM src WHERE parent IS NULL
This will give you the following content:
id | parent | start
---+--------+------
1 | NULL | 1
42 | NULL | 42
4 | NULL | 4
7 | NULL | 7
They say that you can now execute just one more query to get to the goal shown above.
In my opinion, you can do one of two things. Use recursion in the source table to get to the front of each chain, or continuously execute some version of SELECT start FROM dst WHERE dst.id = src.parent after each update to dst (i.e. can’t cache the results).
I don’t think either of these situations is supported by common databases like MySQL, PostgreSQL, SQLite, etc. I do know that in PostgreSQL 8.4 you can achieve recursion using WITH RECURSIVE query, and in Oracle you have START WITH and CONNECT BY clauses. The point is that these things are specific to database type and version.
Is there any way to achieve the desired result using regular SQL92 in just one query? The best I could do is fill-in the start column for the first child with the following (can also use a LEFT JOIN to achieve the same result):
INSERT INTO dst
SELECT s.id, s.parent,
(SELECT start FROM dst AS d WHERE d.id = s.parent) AS start
FROM src AS s
WHERE s.parent IS NOT NULL
If there was some way to re-execute the inner select statement after each insert into dst, then the problem would be solved.
It can not be implemented in any static SQL that follows ANSI SQL 92.
But as you said it can be easy implemented with oracle’s
CONNECT BY