I’m sure the answer is somehow logical but here goes.
I have three big tables joined on three columns, each column is part of the primary key.
I want to get a distinct select on column1.
It works if I get the whole result at once, i.e. i export it in to a file.
But if I paginate it like phpadmin would do LIMIT 1000, 0 I get some column1 values twice, e.g. val1 on page 1 and val1 on the last page. This also means I’m not getting some values back which I should have had.
If I add a ORDER BY column1 everything is ok again, but I loose speed on the last pages, or that is what I’ve been told.
I guess it has something to do with the way mysql is handling the pagination and returns the result without actually knowing the whole result, but it still bugs my.
Can anyone elaborate on that.
The reason for paginating the query is because I don’t like to lock the tables for longer periods at a time.
Does anyone have a insight how to achieve this and at the same time get all the data?
It doesn’t make sense to implement paging using LIMIT without an ORDER BY.
Yes, you’re right that it’s faster without the ORDER BY, because the server is free to return arbitrary results in any order and the results don’t have to be consistent from one query to the next.
If you want correct and consistent results, you must have the ORDER BY. If you are concerned about performance consider adding on index for the column you are ordering by.
From the manual page
LIMIT optimization:If you’re trying to perform some operation on every row then your approach won’t work if data can be added or removed. This is because it will push all the following rows and some rows will be moved onto different pages. Adding a row will push some rows onto the next page, meaning that you see one row twice. Removing a row from an earlier page will cause you to skip a row.
Instead you could use one of these approaches:
idto keep track of how far you have progressed. Select the next n rows with higherid.