Consider the following tables:
[Table: talks]
talkID | title | starred
-------+--------------+--------
1 | talk1-title | 1
2 | talk2-title | 1
3 | talk3-title | 0
4 | talk4-title | 0
5 | talk5-title | 0
[Table: talkspeaker]
talkID | speaker
-------+---------
1 | Speaker1
1 | Speaker2
2 | Speaker3
3 | Speaker4
3 | Speaker5
4 | Speaker6
5 | Speaker7
5 | Speaker8
[Table: similartalks]
talkID | similarTo
-------+----------
1 | 3
1 | 4
2 | 3
2 | 4
2 | 5
3 | 2
4 | 5
5 | 3
5 | 4
What I want to do is: Given the set of starred talks, I would like to select the top 2 of the unstarred talks (starred = 0) and their titles and speakers that are most similar to the set of starred talks. The problem is that getting the speakers requires using an aggregate function, and so does getting the most similar talks.
Without the speakers in the fray, I have been able to get the most similar talks using the following query:
select t2.talkID, t2.title, count(*) as count
from similarTalks s, talks t1, talks t2
where s.talkID = t1.talkID
and t1.Starred = 1
and s.similarTo = t2.TalkID
and t2.Starred = 0
group by t2.title, t2.talkID
order by count desc
limit 2
Generally, I use the following aggregate function for getting the speakers, with appropriate group by columns (assume t = talkspeaker):
group_concat(t.speaker, ', ') as Speakers
as in
select t1.title, group_concat(t2.speaker, ', ') as Speakers
from talks t1, talkspeaker t2
where t1.talkID = t2.talkID
group by t1.title
But I am not able to combine the two things together. It might matter that I am planning to run this query in a sqlite database (that is where the group_concat function comes from). The answer to the top 2 unstarred talks most similar to starred talks seem to be with talkIDs 3 and 4.
Firstly you might want to read this article about reasons to use ANSI 92 Joins instead of the aged ANSI 89 as used above. Secondly, SQLLite does support the GROUP_CONCAT function so you can use this.
You just neeed to add your second query as subquery into the first to get the desired result:
Example on SQL Fiddle
EDIT
You could also do this without a subquery using
DISTINCT:However I see no benefit at all in this method, and it is likely to be less efficient (I have not tested so can’t be certain)