I’m looking for the answer on how DISTINCT clause works in SQL (SQL Server 2008 if that makes a difference) on a query with multiple tables joined?
I mean how the SQL engine handles the query with DISTINCT clause?
The reason I’m asking is that I was told by my far more experienced colleague that SQL applies DISTINCT to every field of every table. It seems unlikely for me, but I want to make sure….
For example having two tables:
CREATE TABLE users
(
u_id INT PRIMARY KEY,
u_name VARCHAR(30),
u_password VARCHAR(30)
)
CREATE TABLE roles
(
r_id INT PRIMARY KEY,
r_name VARCHAR(30)
)
CREATE TABLE users_l_roles
(
u_id INT FOREIGN KEY REFERENCES users(u_id) ,
r_id INT FOREIGN KEY REFERENCES roles(r_id)
)
And then having this query:
SELECT u_name
FROM users
INNER JOIN users_l_roles ON users.u_id = users_l_roles.u_id
INNER JOIN roles ON users_l_roles.r_id = roles.r_id
Assuming there was user with two roles then the above query will return two records with the same user name.
But this query with distinct:
SELECT DISTINCT u_name
FROM users
INNER JOIN users_l_roles ON users.u_id = users_l_roles.u_id
INNER JOIN roles ON users_l_roles.r_id = roles.r_id
will return only one user name.
The question is whether SQL will compare all the fields from all the joined tables (u_id, u_name, u_password, r_id, r_name) or it will compare only named fields in the query (u_name) and distinct the results?
DISTINCTfilters out duplicate values of your returned fields.A really simplified way to look at it is:
FROMandWHEREclausesIt’s semantically equivalent to a
GROUP BYwhere all returned fields are in theGROUP BYclause.