I’m working on developing a web portal for a bookstore, and I want to have the ability to suggest books to a user.
I want something similar to amazon.com, when a user orders book A, the system should provide a list of other suggested books. Book B is suggested if there exists a user Bob that bought both A and B. Additionally, I want my system to return the suggested books sorted on decreasing sales count, and only count sales to users that have bought both books (like Bob).
Here are the important tables:
Book(ISBN, title, publicationYear, etc..)
Orders(orderID, loginName, date)
BooksOrdered(orderID, ISBN, count)
This query is more complex than anything I’ve previously tried.
Current thoughts:
First find all the users that have ordered the same book (ISBN)
- Join all three tables on Book.ISBN = BooksOrdered.ISBN AND Orders.orderID = BooksOrdered.ISBN
- WHERE Book.ISBN = bookInQuestionISBN
- GROUP BY Orders.loginName
- Project out loginName
So something like:
SELECT Orders.loginName as otherBuyerLoginName
FROM Book, Orders, BooksOrdered,
WHERE Book.ISBN = bookInQuestionISBN AND Orders.orderID = BooksOrdered.ISBN
GROUP BY Orders.loginName
Then I could grab all the books these loginNames have ordered, group them by loginName, sum count and ORDER BY DESC SUM(BooksOrdered.count).
However, I’m thinking that the first result will most likely be the book in question. I don’t want to suggest the same book the user has just bought.
What do you suggest? Maybe I should start over from scratch?
EDIT:
Here is some data:
BooksOrdered contains:
orderID ISBN count
3 FakeISBN 3
7 FakeISBN 3
8 FakeISBN 100
11 FakeISBN2 40
7 FakeISBN2 4
10 FakeISBN2 20
10 FakeISBN3 34
11 TesterISBN 3
9 TesterISBN 1
Orders contains:
orderID loginName date
2 Tester 2012-03-15 19:43:27
3 Tester 2012-03-16 15:56:55
6 Tester2 2012-03-16 17:28:02
7 Tester 2012-03-16 17:31:21
8 ni3hao3 2012-03-16 23:18:15
9 ni3hao3 2012-03-17 13:12:38
10 ni3hao3 2012-03-17 13:13:55
11 Bobby 2012-03-17 13:28:14
Alright, now I want to know the top suggestions for the book with ISBN = “TesterISBN”
Two people have ordered “TesterISBN”: ni3hao3 and Bobby
ni3hao3’s total sales history:
1 copy of "TesterISBN"
100 copies of "FakeISBN"
20 copies of "FakeISBN2"
34 copies of "FakeISBN3"
Bobby’s total sales history:
3 copies of "TesterISBN"
40 copies of "FakeISBN2"
So the totals of sales for purchasers of “TesterISBN” are as follows:
4 copies of "TesterISBN"
100 copies of "FakeISBN"
60 copies of "FakeISBN2"
34 copies of "FakeISBN3"
So I’d like the results to return:
FakeISBN
FakeISBN2
FakeISBN3
In that order.
EDIT:
I believe I’ve figured it out:
SELECT Bo.ISBN, B.title, SUM(Bo.count)
FROM BooksOrdered Bo, Orders O, Book B
WHERE Bo.orderID = O.orderID AND Bo.ISBN = B.ISBN
AND Bo.ISBN != 'TesterISBN'
AND O.loginName IN ( SELECT DISTINCT(Orders.loginName) as otherBuyerLoginName
FROM Orders, BooksOrdered
WHERE BooksOrdered.ISBN = 'TesterISBN'
AND Orders.orderID = BooksOrdered.orderID)
GROUP BY Bo.ISBN
ORDER BY SUM(Bo.count) DESC
This seems to do the trick