I have an MS SQL Server 2008 database where I store places that serve food (cafés, restaurants, diners etc.). On a web site connected to this database people can rate the places on a scale from 1 to 3.
On the web site there’s a page where people can view a top list with the top 25 (best rated) places in a certain city. The database structure looks something like this (there is more info stored in the tables, but here’s the relevant info):

A place is situated in a city and votes are placed on a place.
Up until now I’ve just calculated an average vote score for each place where I divide the sum of all votes for a certain place with the number of votes for that place, something like this (pseudo code):
vote_count = total number of votes for the place
vote_sum = total sum of all the votes for the place
vote_score = vote_sum/vote_count
I also have to handle divide by zero if a place has no votes. All this is done inside the stored procedure that fetches the other data that I want to display in the top list. Here is the current stored procedure that fetches the top 25 places with the highest vote score:
ALTER PROCEDURE [dbo].[GetTopListByCity]
(
@city_id Int
)
AS
SELECT TOP 25 dbo.Places.place_id,
dbo.Places.city_id,
dbo.Places.place_name,
dbo.Places.place_alias,
dbo.Places.place_street_address,
dbo.Places.place_street_number,
dbo.Places.place_zip_code,
dbo.Cities.city_name,
dbo.Cities.city_alias,
dbo.Places.place_phone,
dbo.Places.place_lat,
dbo.Places.place_lng,
ISNULL(SUM(dbo.Votes.vote_score),0) AS vote_sum,
(SELECT COUNT(*) FROM dbo.Votes WHERE dbo.Votes.place_id = dbo.Places.place_id) AS vote_count,
COALESCE((CONVERT(FLOAT,SUM(dbo.Votes.vote_score))/(CONVERT(FLOAT,(SELECT COUNT(*) FROM dbo.Votes WHERE dbo.Votes.place_id = dbo.Places.place_id)))),0) AS vote_score
FROM dbo.Places INNER JOIN dbo.Cities ON dbo.Places.city_id = dbo.Cities.city_id
LEFT OUTER JOIN dbo.Votes ON dbo.Places.place_id = dbo.Votes.place_id
WHERE dbo.Places.city_id = @city_id
AND dbo.Places.hidden = 0
GROUP BY dbo.Places.place_id,
dbo.Places.city_id,
dbo.Places.place_name,
dbo.Places.place_alias,
dbo.Places.place_street_address,
dbo.Places.place_street_number,
dbo.Places.place_zip_code,
dbo.Cities.city_name,
dbo.Cities.city_alias,
dbo.Places.place_phone,
dbo.Places.place_lat,
dbo.Places.place_lng
ORDER BY vote_score DESC, vote_count DESC, place_name ASC
RETURN
As you can see it fetches more than just the vote score – I need data about the place, the city it’s situated in and so on. This works fine, but there is one big problem: the vote score is too simple because it doesn’t take in to account the number of votes. With the simple calculation method a place that has one vote with the score 3 will end up higher in the list than a place that has fourteen votes with the score 3 and one vote with the score 2:
3/1 = 3
(14*3 + 1*2) = 44/15 = 2.933333333333
To fix this I’ve been looking into using some form of weighted average/weighted index. I’ve found an example of a true bayesian estimate that looks promising. It looks like this:
weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
where:
R = average for the place (mean) = (Rating)
v = number of votes for the place = (votes)
m = minimum number of votes required to be listed in the Top 25 (unsure how many, but somewhere between 2-5 seems realistic)
C = the mean vote across the whole database
The problems begin when I try to implement this weighted rating in a stored procedure – it quickly becomes complicated and I get tangled into parenthesis and loose track of what the stored procedure does.
Now I need some help with two questions:
Is this a suitable method for calculating a weighted index for my site?
How would this (or another suitable calculation method) look like when implemented in a stored procedure?
I cannot see any problem with you calculations. But I can see that you are doing the same thing many times. My suggestion will help you do the aggregates in one place and then the select is quite easy.
The CTE function helps us reuse the calculations. So that we don’t have to use
SUM(vote_score)andSELECT COUNT(*) FROM Votes WHERE...multiples times. So then when you are selecting the calculations is quite easy to follow.I hope this helps
Edit
You do not have to define the table columns in the CTE. This
CTE (SumOfVoteScore, CountOfVotes, place_id) ASworks as good as thisCTE AS. You need to define the columns if you are using a recursive cte. Beacuse you areunionwith the other part.For reference here and here you will find some information about CTE functions