I have three mysql tables that I would like to extract some information from, the tables are:
- Videos – represents a video with a score.
- Tags – contains a global list of tags.
- VideoTags creates an association between a Video and a Tag.
What I want to do is find the videos with the highest points for each tag. There are many videos with the same tag, but my result set will have the
same number of rows as there are tags. The end goal is to have a list of the best video (by points) for each unique tag (tags being a topic prefixed with a hash).
My SQL noob attempt at achieving this is as follows:
SELECT video.id AS video_id, video.owner_id, MAX(video.points), tag.id AS tag_id
FROM Videos video, VideoTags videotag, Tags tag
WHERE video.id = videotag.video_id
AND videotag.tag_id = tag.id
AND tag.content LIKE '#%'
GROUP BY tag.id
Here’s the schema and sample data:
DROP TABLE IF EXISTS `Video`;
CREATE TABLE `Video` (
`id` varchar(24) NOT NULL default '',
`owner_id` varchar(24) NOT NULL default '',
`points` DOUBLE NOT NULL default 0
);
DROP TABLE IF EXISTS `Tags`;
CREATE TABLE `Tags` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` varchar(32) NOT NULL default ''
PRIMARY KEY (id)
);
DROP TABLE IF EXISTS `VideoTags`;
CREATE TABLE `VideoTags` (
`video_id` varchar(24) NOT NULL default '',
`tag_id` int(11) NOT NULL
);
INSERT INTO Videos (id,owner_id,points) VALUES ('owner-x-video-a','owner-x', 20);
INSERT INTO Videos (id,owner_id,points) VALUES ('owner-x-video-b','owner-x', 15);
INSERT INTO Videos (id,owner_id,points) VALUES ('owner-y-video-k','owner-y', 12);
INSERT INTO Videos (id,owner_id,points) VALUES ('owner-y-video-l','owner-y', 17);
INSERT INTO Videos (id,owner_id,points) VALUES ('owner-y-video-m','owner-y', 44);
INSERT INTO Tags (id, content) VALUES (111, '#topic-1');
INSERT INTO Tags (id, content) VALUES (222, '#topic-2');
INSERT INTO VideoTags (video_id,tag_id) VALUES ('owner-x-video-a',111);
INSERT INTO VideoTags (video_id,tag_id) VALUES ('owner-x-video-b',111);
INSERT INTO VideoTags (video_id,tag_id) VALUES ('owner-y-video-k',111);
INSERT INTO VideoTags (video_id,tag_id) VALUES ('owner-y-video-l',222);
INSERT INTO VideoTags (video_id,tag_id) VALUES ('owner-y-video-m',222);
What I expect to see is:
video_id owner_id MAX(video.points) tag_id
owner-x-video-a owner-x 20 111
owner-y-video-m owner-y 44 222
but what I get is:
video_id owner_id MAX(video.points) tag_id
owner-x-video-a owner-x 20 111
owner-y-video-l owner-y 44 222
Unfortunately the video_id for the second row is not what I expected, as owner-y-video-l
does not have 44 points, rather it has 17 so would not be the highest scoring video for
the tag with id 222.
Any Masters of the SQL Universe out there that can help me out? Thanks a million 🙂
You want the groupwise maximum:
See it on sqlfiddle.
Note that this query returns all videos having the maximum number of points within each tag, so more than one record will be returned for tied tags. If you wish to return only one record in such situations, please specify how to determine the video that should be returned.