I have a two select statements joined by UNION ALL. In the first statement a where clause gathers only rows that have been shown previously to the user. The second statement gathers all rows that haven’t been shown to the user, therefore I end up with the viewed results first and non-viewed results after.
Of course this could simply be achieved with the same select statement using a simple ORDER BY, however the reason for two separate selects is simple after you realize what I hope to accomplish.
Consider the following structure and data.
+----+------+-----+--------+------+
| id | from | to | viewed | data |
+----+------+-----+--------+------+
| 1 | 1 | 10 | true | .... |
| 2 | 10 | 1 | true | .... |
| 3 | 1 | 10 | true | .... |
| 4 | 6 | 8 | true | .... |
| 5 | 1 | 10 | true | .... |
| 6 | 10 | 1 | true | .... |
| 7 | 8 | 6 | true | .... |
| 8 | 10 | 1 | true | .... |
| 9 | 6 | 8 | true | .... |
| 10 | 2 | 3 | true | .... |
| 11 | 1 | 10 | true | .... |
| 12 | 8 | 6 | true | .... |
| 13 | 10 | 1 | false | .... |
| 14 | 1 | 10 | false | .... |
| 15 | 6 | 8 | false | .... |
| 16 | 10 | 1 | false | .... |
| 17 | 8 | 6 | false | .... |
| 18 | 3 | 2 | false | .... |
+----+------+-----+--------+------+
Basically I wish all non viewed rows to be selected by the statement, that is accomplished by checking weather the viewed column is true or false, pretty simple and straightforward, nothing to worry here.
However when it comes to the rows already viewed, meaning the column viewed is TRUE, for those records I only want 3 rows to be returned for each group.
The appropriate result in this instance should be the 3 most recent rows of each group.
+----+------+-----+--------+------+
| id | from | to | viewed | data |
+----+------+-----+--------+------+
| 6 | 10 | 1 | true | .... |
| 7 | 8 | 6 | true | .... |
| 8 | 10 | 1 | true | .... |
| 9 | 6 | 8 | true | .... |
| 10 | 2 | 3 | true | .... |
| 11 | 1 | 10 | true | .... |
| 12 | 8 | 6 | true | .... |
+----+------+-----+--------+------+
As you see from the ideal result set we have three groups. Therefore the desired query for the viewed results should show a maximum of 3 rows for each grouping it finds. In this case these groupings were 10 with 1 and 8 with 6, both which had three rows to be shown, while the other group 2 with 3 only had one row to be shown.
Please note that where from = x and to = y, makes the same grouping as if it was from = y and to = x. Therefore considering the first grouping (10 with 1), from = 10 and to = 1 is the same group if it was from = 1 and to = 10.
However there are plenty of groups in the whole table that I only wish the 3 most recent of each to be returned in the select statement, and thats my problem, I not sure how that can be accomplished in the most efficient way possible considering the table will have hundreds if not thousands of records at some point.
Thanks for your help.
Note: The columns id, from, to and viewed are indexed, that should help with performance.
PS: I’m unsure on how to name this question exactly, if you have a better idea, be my guest and edit the title.
What a hairball! This gets progressively harder as you move from most recent, to second most recent, to third most recent.
Let’s put this together by getting the list of IDs we need. Then we can pull the items from the table by ID.
This, relatively easy, query gets you the ids of your most recent items
Fiddle: http://sqlfiddle.com/#!2/f7045/27/0
Next, we need to get the ids of the second most recent items. To do this, we need a self-join style query. We need to do the same summary but on a virtual table that omits the most recent items.
Fiddle: http://sqlfiddle.com/#!2/f7045/44/0
Finally, we need to get the ids of the third most recent items. Mercy! We need to join that query we just had, to the table again.
Fiddle: http://sqlfiddle.com/#!2/f7045/45/0
So, now we take the union of all those ids, and use them to grab the right rows from the table, and we’re done.
Tee hee. Too much SQL. Fiddle: http://sqlfiddle.com/#!2/f7045/47/0
And now, we need to cope with your last requirement, the requirement that your graph is unordered. That is, that from=n to=m is the same as from=m to=n.
To do this we need a virtual table instead of the physical table. This will do the trick.
Now we need to use this virtual table, this view, everywhere the physical table used to appear. Let’s use a view to do this.
So, our ultimate query is:
Fiddle: http://sqlfiddle.com/#!2/8c154/4/0