I’m trying to build a filtering system for products. Products have many attributes including price, size(cm) and (# of) sides. I want to construct an SQL query that always returns ALL the products but orders them on how closely they meet the search criteria.
For example lets say I have the following products:
Product A
Price: 250
Size: 50
Sides: 4
Product B
Price: 300
Size: 60
Sides: 3
Product C
Price: 200
Size: 60
Sides: 5
And have a search criteria of:
Price: $210 – $260
I want it to return:
Product A, Product C, Product B
A is first because it meets all the search criteria. C is ahead of B because $200 is closest to the $210-$260 range than $300.
That’s a fairly simple case but the search could involve any attribute(s).
The algorithm I have in my head is doing some sort of standard deviation calculation for each attribute which would give you a ‘score’ of how closely that attribute meets the search criteria, these scores could be summed up across a product to give a total score of how relavant the product is. Is that along the right track?
I realize this is a fairly complicated question, I’m really just looking for some guidance more than anything.
You basically have to come up with a ‘distance’ function for each row which returns 0 if all criteria match or some positive value indicating how close otherwise. Each of the different columns will have to have some weight as being off by $1 is not as far off as being off by 1 number of sides. For the price if it’s below 210 the unweighted distance would be (210 – price), if the price is over 260 the unweighted distance for the price column would be (price – 260). Then you can add up the weighted distances w1 * price_distance + w2 * size_distance + s3 * sides_distance. Then you can do a order by distance.