Okay, I have a table with some junk data and no unique identifier column. Let me give you an example of the table I’m working with:
A | B | C | D | E |
--------------------------------------------------
1. Fiona | Smith | NULL | 2152 Cherry Lane | CA |
2. Fiona | Smith | NULL | NULL | NULL |
3. Bill | NULL | ACME | 2903 Center Road | WA |
4. Bill | NULL | ACME | NULL | NULL |
5. NULL | NULL | ABC | 2300 Water St | PA |
6. NULL | NULL | ABC | 2300 Water St | PA |
7. NULL | NULL | NULL | 3455 B Street | CO |
I need to write a SELECT statement that grabs only distinct rows. For example, take rows 1 and 2. They both obviously refer to the same person, but they’re only partially duplicate. Out of those two, I want row 1 included in my SELECT statement because it contains the most data in each column. Same goes for rows 3 and 4. Row 3 is the one I want included. For rows 5 and 6, it does not matter which one is selected since they both are exact duplicates. Row 7 would be included by default since it is distinct (meaning A, B and C, not just A and B).
Here’s what I have tried:
SELECT A, B, C = MAX(D), MAX(E),
FROM dbo.Data
GROUP BY A, B, C;
This seems to grab the unique rows I want, but the data is somehow placed into the wrong columns.
This approach treats D and E as equal: