I have a single table with 2 references (user_id and item_id) that I need to query to find all users with certain items.
The tricky part is, I need to order the results based on, not just the number of results they have, but based on WHICH items they have.
Here’s the table:
+--------------+-----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-----------------------+------+-----+---------+-------+
| user_id | int(11) | NO | | 0 | |
| item_id | int(11) unsigned | YES | | NULL | |
+--------------+-----------------------+------+-----+---------+-------+
So my query looks like this:
SELECT user_id, item_id
FROM user_items
WHERE item_id IN (2, 122, 132)
GROUP BY user_id, item_id
HAVING SUM(item_id = 2);
Looks easy enough? Here’s where the tough part comes in:
item_id = 2 is REQUIRED
item_id = 122 and 132 are OPTIONAL. Anything after 132 is also optional.
I need to order results based on:
1) if ALL of the items are found.
2) if only items 2 and 122 are found.
3) if only item 2 is found.
Here’s the SQL fiddle file for fiddling: http://sqlfiddle.com/#!2/6b1c1/6/0
I’m thinking, if there’s some way I can setup in the, something like this: SELECT query to say
IF (item_id = 2 AND item_id = 122 AND item_id = 132) AS matches_all,
IF (item_id = 2, item_id = 122) AS matches_some,
IF (item_id = 2) AS matches_first
EDIT with updated query
Here’s what I have so far. It’s about 95% of what I need:
http://sqlfiddle.com/#!2/6b1c1/47
SELECT user_id, item_id,
@tmp_1 := IF(SUM(item_id = 2), 1, 0) AS tmp_1,
@tmp_2 := IF(SUM(item_id = 122), 1, 0) AS tmp_2,
@tmp_3 := IF(SUM(item_id = 132), 1, 0) AS tmp_3,
@tmp_4 := IF(SUM(item_id = 126), 1, 0) AS tmp_4,
CAST(@tmp_3 + @tmp_4 AS UNSIGNED) AS total_other
FROM user_items
WHERE item_id IN (2, 122, 132, 126)
GROUP BY user_id
HAVING SUM(item_id = 2)
ORDER BY tmp_1 DESC, tmp_2 DESC, total_other DESC
A couple more details:
1) I will only have a maximum of 12 items entered, so I can assign each one it’s own temp field if needed.
2) The above query works perfectly for tmp_1 and tmp_2. If we have a user with items 2 and 122, it puts those at the top of the list.
For the rest, 3-4 (3 to up to 12), I need a calculation of the number of matches, which is why I made an attempt at CAST(@tmp_3 + @tmp_4. I’m not sure how to get those to calculate.
3) Once I have the total calculation for items 3 – 12, then that will be the third and final item in the ORDER BY clause.
Example result
Based on the schema provided in the SQL fiddle file, here is the result that should be returned based on searching for all users with item_id’s: 2, 122, 132, 126
+---------+--------------+----------------+-------------+
| USER_ID | PRIMARY_ITEM | SECONDARY_ITEM | OTHER_ITEMS |
+---------+--------------+----------------+-------------+
| 39 | 1 | 1 | 2 |
| 54 | 1 | 1 | 0 |
| 55 | 1 | 0 | 0 |
+---------+--------------+----------------+-------------+
UPDATE:
Based on the update to your question (including the desired resultset), here’s a query that returns that resultset. (This is VERY similar to the query in the inline view explained in my original answer)
Note that the expression to calculate the
other_itemscolumn can be extended to handle any number of other items_id values. (You just want to be sure that the same item_id is not specified twice in there, or it’s going to get “counted” twice), e.g.That’s basically doing a check for each item, and then deriving a 1 or a 0, and then adding up the 1s and 0s to come up with a total.
Also note that the IF() function call is not necessary, those expression could actually be reduced to:
Note that the
WHEREclause is not actually needed to return the correct resultset. (But if it’s there, the predicate has to match the item_id values that are being checked in the SELECT list.Note also that the ORDER BY does not need to include the
primary_item DESC, since our query guarantees that the value ofprimary_itemwill be a 1. It’s sufficient to start the ordering withsecondary_item DESC, since that can be either 1 or 0.A covering index
on (user_id,item_id)may speed performance, or possibly an index with a leading column ofitem_idmay be better. (Absent the WHERE clause, the query will need to inspect every row in the table, basically a full table scan, or a full index scan.)From the resultset, it looks like you want to return a ‘1’ if the user has one or more of the item (rather than a count of how many of a particular item he has.) If what you want to return is a count of the number of each item, then you’d replace the
MAX()aggregate with aSUM()aggregate, but that’s more problematic for deciphering the contents of the OTHER_ITEMS column.Note the
HAVING primary_itemclause is what gets us only rows for those users that have at least one of theitem_id = 2.UPDATE:
Francis said… that query [in your original answer] is returning multiple results per user, which is not what I was after.
A: this is a prime example of where showing an example of the resultset you want returned would be of benefit. Your query has both
user_idand item_id` in the SELECT list, and no indication that you want to return only one row per user, or only one row per user_id and item_id combination.To get that, then simply add a
GROUP BY d.user_idor aGROUP BY d.user_id, d.item_idclause before theORDER BYclause.This isn’t elegant, but I think it returns the resultset you specified.
The inline view (the query aliased as
f) does the “check” of which of the items are found for the user.To see how this works, we first, we check the results of just that inline view…
The
WHEREclause could be omitted here. For our purpose here, we’re basically just getting a list of user_id, along with indicators of which of the specified items they have.The expression inside the MAX aggregates check whether the item_id matches 2, 122 or 132, respectively, and returns a 1 or a 0. We use the
MAXaggregate to pull out any value of 1 that we find.We do need the
GROUP BY, so we get a distinct list of user_id.We use the
HAVINGclause so that users that don’t have anitem_id = 2are omitted. It could be written like this(adding the greater than zero, but that’s not required, since we are guaranteed that item_2 will have a value of 0 or 1)
The
ORDER BYisn’t really required here (since we are going to JOIN this back to the user_items table.) (TheORDER BYis only required on the outermost query.) But it does demonstrate that it is possible to get this resultset ordered.(If this were my requirement, I might just stop here, and make use of this result set; but that’s not the resultset you specified.)
We join that query (using it as an inline view, or derived table in MySQL parlance), to the
user_itemstable, so we return row for ONLY those users that match a user_id from that query.We need to add the
WHEREclause, so we only pull outitem_idvalues in the specified list.And we need the
ORDER BYto get us the resultset in the specified order.