My objective
I am trying to retrieve multiple random rows that contain only unique userid but for the type column to be random – type can only be 0 or 1. The table in question will contain less than 1,000 rows at any given time.
My table
CREATE TABLE tbl_message_queue (
userid bigint(20) NOT NULL,
messageid varchar(20) NOT NULL,
`type` int(1) NOT NULL,
PRIMARY KEY (userid,messageid,`type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Sample data
userid | messageid | type
---------------------------------------------------
4353453 | 518423942 | 0
4353453 | 518423942 | 1
2342934 | 748475435 | 0
2342934 | 748475435 | 1
7657529 | 821516543 | 0
7657529 | 821516543 | 1
0823546 | 932843285 | 0
0823546 | 932843285 | 1
What to rule out
Using ORDER BY RAND() isn’t feasible as over at least 18,000 of these types of queries are executed by applications at any given moment and are causing high load. Using SELECT DISTINCT or GROUP BY is (obviously) more efficient and will always pick unique userid but type will always equal to 0 with an acceptable load.
The common method is to create an id column but I’m looking for an alternative way only. The group primary key cannot change as it is required and deeply integrated into our application, however the structure of each column can be altered.
Thanks.
My understanding of your question is that for each
useridyou have two entries, but want to extract only one, at random.To achieve this, you ought to generate a random value between 0 and 1 for each unique
userid, and then JOIN this list with the starting list:But if an
ORDER BY RAND()does not work for you, maybe we should compromise.In the above sequence, any two userids will be uncorrelated — i.e., the fact that user A gets type 0 tells you nothing about what user B will turn up with.
Depending on the use case, a less random (but “apparently random”) sequence could be obtained with two queries:
This way, you can get what seems a random extraction. What really happens is that the userids are correlated, and you only have some couple dozens different extractions possible. But using only simple operators, and no JOINs, this query is very fast.