I attend a database course at my school. The teacher gave us a simple

Question

0

Asked: June 4, 20262026-06-04T05:22:01+00:00 2026-06-04T05:22:01+00:00

I attend a database course at my school. The teacher gave us a simple

0

I attend a database course at my school. The teacher gave us a simple exercise: consider the following, simple schema:

Table Book:
    Column title (primary key)
    Column genre (one of: "romance", "polar", ...)

Table Author:
    Column title (foreign key on Book.title)
    Column name
    Primary key on (title, name)

Among the questions was the following one:

Write the query that returns the authors who have written romance books.

I proposed this answer:

select distinct name 
from Author where title in (select title from Book where genre = "romance")

However the teacher said it was wrong, and that the correct answer was:

select distinct name 
from Book, Author 
where Book.title = Author.title 
  and genre = "romance"

When I asked for explanations all I got was a “if you had paid more attention to the course you would know why”. Brilliant.

So, why is my answer incorrect? What exactly is the difference between these queries? What exactly do they do, on the DB engine level?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T05:22:03+00:00

So, why is my answer incorrect?

You answer is correct.

My guess why the teacher marked it as wrong, that he/she tried to practise the use of joins with that question. But that should have been part of the question if it was intended.

What exactly is the difference between these queries

Technically they are different indeed. A DBMS with a simple query optimizer will retrieve the subselect in a different way than the join from your teacher’s answer.

I wouldn’t be surprised if a DBMS with good optimizer might actually come up with the same execution plan for both queries.

Edit

I created some testdata with 50000 books, 50000 authors and 7 different genres to test (smaller numbers don’t really make sense as the optimizers tend to simply grab the whole table then). The statement would return 7144 rows.

PostgreSQL

The execution plans are nearly identical with some small change in the “join” method.

Here is the plan for the sub-select version: http://explain.depesz.com/s/eov
Here is the plan for the join version: http://explain.depesz.com/s/aTI

Surprisingly, the join version has a slightly higher cost value.

Oracle

Both plans are 100% identical:

--------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |  6815 |   399K|       |   273   (2)| 00:00:04 |
|   1 |  HASH UNIQUE        |        |  6815 |   399K|   464K|   273   (2)| 00:00:04 |
|*  2 |   HASH JOIN         |        |  6815 |   399K|       |   172   (2)| 00:00:03 |
|*  3 |    TABLE ACCESS FULL| BOOK   |  6815 |   166K|       |    69   (2)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| AUTHOR | 50000 |  1708K|       |   103   (1)| 00:00:02 |
--------------------------------------------------------------------------------------

Looking at the statistics when using autotrace there is also no difference whatsoever. I didn’t bother to actually create a trace file to analyze it as I don’t expect to see a difference there.

Things don’t really change if an index on book.genre is added. Oracle sticks with the full table scan (even with 100000 rows). Probably because the tables are not very wide and a lot of rows fit on a single page.

PostgreSQL does use the index for both statements but there is still no real difference between the plans.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I attend a database course at my school. The teacher gave us a simple

Leave an answerCancel reply

1 Answer

Edit

PostgreSQL

Oracle

Leave an answer
Cancel reply