Let’s assume we have this very simple table:
|class |student|
---------------
Math Alice
Math Bob
Math Peter
Math Anne
Music Bob
Music Chis
Music Debbie
Music Emily
Music David
Sports Alice
Sports Chris
Sports Emily
.
.
.
Now I want to find out, who I have the most classes in common with.
So basically I want a query that gets as input a list of classes (some subset of all classes)
and returns a list like:
|student |common classes|
Brad 6
Melissa 4
Chris 3
Bob 3
.
.
.
What I’m doing right now is a single query for every class. Merging the results is done on the client side. This is very slow, because I am a very hardworking student and I’m attending around 1000 classes – and so do most of the other students. I’d like to reduce the transactions and do the processing on the server side using stored procedures. I have never worked with sprocs, so I’d be glad if someone could give me some hints on how to do that.
(note: I’m using a MySQL cluster, because it’s a very big school with 1 million classes and several million students)
UPDATE
Ok, it’s obvious that I’m not a DB expert 😉 4 times the nearly the same answer means it’s too easy.
Thank you anyway! I tested the following SQL statement and it’s returning what I need, although it is very slow on the cluster (but that will be another question, I guess).
SELECT student, COUNT(class) as common_classes
FROM classes_table
WHERE class in (my_subject_list)
GROUP BY student
ORDER BY common_classes DESC
But actually I simplified my problem a bit too much, so let’s make a bit it harder:
Some classes are more important than others, so they are weighted:
| class | importance |
Music 0.8
Math 0.7
Sports 0.01
English 0.5
...
Additionally, students can be more ore less important.
(In case you’re wondering what this is all about… it’s an analogy. And it’s getting worse. So please just accept that fact. It has to do with normalizing.)
|student | importance |
Bob 3.5
Anne 4.2
Chris 0.3
...
This means a simple COUNT() won’t do it anymore.
In order to find out who I have the most in common with, I want to do the following:
map<Student,float> studentRanking;
foreach (Class c in myClasses)
{
float myScoreForClassC = getMyScoreForClass(c);
List students = getStudentsAttendingClass(c);
foreach (Student s in students)
{
float studentScoreForClassC = c.classImportance*s.Importance;
studentRanking[s] += min(studentScoreForClassC, myScoreForClassC);
}
}
I hope it’s not getting too confusing.
I should also mention that I myself am not in the database, so I have to tell the SELECT statement / stored procedure, which classes I’m attending.
Update re your question update.
Assuming there’s a table
class_importanceandstudent_importanceas you describe above:The only thing this doesn’t have is the
LEAST(weighted_importance, myScoreForClassC)because I don’t know how you calculate that.Supposing you have another table
myScores:You can combine it all like this (see the extra
LEASTinside theSUM):If your
myScoresdidn’t have a score for a particular class and you wanted to assign some default, you could useIFNULL(m.score,defaultvalue).