Background
My typical use case:
# Table
id category dataUID
---------------------------
0 A (NULL)
1 B (NULL)
2 C text1
3 C text1
4 D text2
5 D text3
# Query
SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`
This is fine; it will strip out any “duplicate categories” in the recordset that’s being worked on, giving me the “highest” ID for each category.
I can then go on use this ID to pull out all the data again:
# Query
SELECT * FROM `table` JOIN (
SELECT MAX(`id`) AS `id` FROM `table`
GROUP BY `category`
) _ USING(`id`)
# Result
id category dataUID
---------------------------
0 A (NULL)
1 B (NULL)
3 C text1
5 D text3
Note that this is not the same as:
SELECT MAX(`id`) AS `id`, `category`, `dataUID` FROM `table`
GROUP BY `category`
Per the documentation:
In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the select list that are not named in the
GROUP BY clause. For example, this query is illegal in standard SQL
because the name column in the select list does not appear in the
GROUP BY:SELECT o.custid, c.name, MAX(o.payment) FROM orders AS o, customers AS c WHERE o.custid = c.custid GROUP BY o.custid;For the query to be legal, the name column must be omitted from the
select list or named in the GROUP BY clause.MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group.[..]
This extension assumes that the nongrouped columns will have the same group-wise values. Otherwise, the result is indeterminate.
So I’d get an unspecified value for dataUID — as an example, either text2 or text3 for result with id 5.
This is actually a problem for other fields in my real case; as it happens, for the dataUID column specifically, generally I don’t really care which value I get.
Problem
However!
If any of the rows for a given category has a NULL dataUID, and at least one other row has a non-NULL dataUID, I’d like MAX to ignore the NULL ones.
So:
id category dataUID
---------------------------
4 D text2
5 D (NULL)
At present, since I pick out the row with the maximum ID, I get:
5 D (NULL)
But, because the dataUID is NULL, instead I want:
4 D text2
How can I get this? How can I add conditional logic to the use of aggregate MAX?
I thought of maybe handing MAX a tuple and pulling the id out from it afterwards:
GET_SECOND_PART_SOMEHOW(MAX((IF(`dataUID` NOT NULL, 1, 0), `id`))) AS `id`
But I don’t think MAX will accept arbitrary expressions like that, let alone tuples, and I don’t know how I’d retrieve the second part of the tuple after-the-fact.
slight tweak to @ypercube’s answer. To get the
ids you can useAnd then plug that into a
join