My collection contains:
{ user_id : 1, product_id : 1 },
{ user_id : 1, product_id : 2 },
{ user_id : 1, product_id : 3 },
{ user_id : 2, product_id : 2 },
{ user_id : 2, product_id : 3 },
{ user_id : 3, product_id : 2 },
My collection track product viewed by a user where user_id is ID of user and product_id is ID of product.
I want to compute similarity between two users, e.g. number of product they both viewed.
For example from collection above, similarity between users will be
{ user_id1 : 1, user_id2 : 2, similarity : 2 },
{ user_id1 : 1, user_id2 : 3, similarity : 1 },
{ user_id1 : 2, user_id2 : 3, similarity : 1 },
Edited
I’ve done it without map-reduce
def self.build_similarity_weight
users_id = ProductView.all.distinct(:user_id).to_a
users_id.each do |user_id|
this_user_products = ProductView.all.where(user_id: user_id).distinct(:product_id).to_a
other_users = users_id.map { |e| e }
other_users.delete_if { |x| x == user_id }
other_users.each do |other_uid|
other_user_products = ProductView.all.where(user_id: other_uid).distinct(:product_id).to_a
user_sim = (other_user_products & this_user_products).length
usw = UserSimilarityWeight.new(user_id1: user_id, user_id2: other_uid, weight: user_sim)
usw.save
end
end
end
The problem is my code is not efficient, O(n2), where n is number of users.
How can I make my code more efficient using map-reduce?
Regards,
First, you do 2 mapreduces.
(working on the result of the first map reduce)
Second, you can’t be more efficient than O(n2) because your result is of order of O(n2).
Meaning, even if in some magically way, you will get the pairs and the similarity, you still need to write n^2 of pairs.