I was just asking myself, if this makes sense:
I have a query that joins with a couple of tables, groups the result (to use GROUP_CONCAT on some columns), and then filters out some results with HAVING, before sorting them with ORDER BY
The LIMIT, is the last statement here – so basically, the limit only takes effect when HAVING is going through the results, right?
I was asking myself – what if I already Limit the FROM part, by doing a SUBSELECT with the same LIMIT? But it’s not that simple, it gets a bit more complicated:
It’s hard to come up with an example that makes sense, I only have a bad one:
A car has always 4 wheels, every wheel has an own table row. So it doesnt make sense to search past the first 4 wheels – I know that, MySQL probably doesnt. So having LIMIT 4 after the HAVING, would mean that mysql searches trough the whole table (listing millions of wheels) even if it could have stopped at the 4th.
Basically, my question is – does a SUBSELECT make sense in query below?
without SUBSELECT:
SELECT *
FROM carwheel
LEFT JOIN ... ON ...
LEFT JOIN ... ON ...
LEFT JOIN ... ON ...
WHERE ...
GROUP BY ...
HAVING ...
ORDER BY ...
LIMIT 4
with SUBSELECT:
SELECT *
FROM (
SELECT *
FROM carwheel
WHERE ... -- (first, simple filter)
LIMIT 4
) AS carwheel
LEFT JOIN ... ON ...
LEFT JOIN ... ON ...
LEFT JOIN ... ON ...
WHERE ...
GROUP BY ...
HAVING ...
ORDER BY ...
LIMIT 4 -- (does it make sense, to do it twice?)
I’ve read that even adding LIMIT 1 when you expect a single row, makes a lot of sense since MySQL stops to look for more rows after finding the first one. That’s the whole reason I’m thinking about the subselect, since MySQL will look trough the whole carwheel table, instead of stopping at the 4th, if the LIMIT takes only effect after HAVING, right?
You get different results, because the order is undefined when you limit in the from subselect. You get four rows in both cases, but in one case you might get rows 7, 14, 23, 8 and in another case you might get rows 1, 2, 88, 17. So it makes sense to limit the result set in order to minimize traffic, for example, but keep in mind that, depending on where you limit, you get different rows.
Edit:
After your clarification, it makes sense to pre-filter the rows in your from clause, because then the join has less rows to process. The second limit is useful too, because with the
left joins you can get more than four rows in your result set.But to be sure, which one is better, do an
explain select ...with both queries. Then, you can compare both selects and see which one is more efficient.