I’m running this query:
CREATE TABLE
SELECT people.*, Sheet1.department
FROM people LEFT JOIN Sheet1 ON people.depno = Sheet1.depno
On a set of tables detailing employee records.
The goal is to create a new table that has all the “people” data, plus a human-readable department name. Simple, right?
The problem is that each record in the resulting table appears to be duplicated exactly (with literally every field being the same), turning a roughly 23,000-record table into a roughly 46,000-record table. I say “roughly” because it’s not an exact doubling — there’s a difference of about a hundred records.
Some details: The “people” table contains 15 fields, including the “depno” field, which is an integer indicating department.
The “Sheet1” table is, as one would guess, a table generated from an imported xls file containing two fields: the shared “depno” and a new “department” (the latter being a verbose department name corresponding to the depno in question). There are 44 records in the “Sheet1” table.
Thanks in advance for any pointers on this. Let me know what other information you can use from me.
Update: Here’s the code I ended up using, from my response to Johan (thanks again to everyone who worked on this):
CREATE TABLE morebetter
SELECT people.*, Sheet1.department FROM people
LEFT JOIN Sheet1 ON people.depno = Sheet1.depno
GROUP BY id
The people.depno is not unique, that’s why you’re getting the doubling.
Change the
SELECTpart toThis will eliminate duplicate rows.
In MySQL you can also write
Which works slightly different.
The first query eliminates rows with duplicate output, the second query eliminates records with duplicate
people.depno, even if people.depno does not appear in the output.I like the second form, because it makes explicit which duplicate you’re trying to eliminate and you don’t need to tweak the output.
It’s also slightly faster in executing time.
***Warning***
The
group byversion will eliminate any double people.depno it finds, but if the other fields in the select are not identical it will just choose one at random!In other words. If the outcome of the
select distinctis different from thegroup byversion that means that MySQL is silently dropping non-duplicate rows.This may or may not be what you want!
In order to be safe, do a
group byon all fields that you care about!If the group by is on a
uniquekey than it’s pointless to include further fields from the same table as that unique key.