My collection contains: { user_id : 1, product_id : 1 }, { user_id :

Question

0

Asked: June 4, 20262026-06-04T04:58:32+00:00 2026-06-04T04:58:32+00:00

My collection contains: { user_id : 1, product_id : 1 }, { user_id :

0

My collection contains:

{ user_id : 1, product_id : 1 },
{ user_id : 1, product_id : 2 },
{ user_id : 1, product_id : 3 },
{ user_id : 2, product_id : 2 },
{ user_id : 2, product_id : 3 },
{ user_id : 3, product_id : 2 },

My collection track product viewed by a user where user_id is ID of user and product_id is ID of product.
I want to compute similarity between two users, e.g. number of product they both viewed.
For example from collection above, similarity between users will be

{ user_id1 : 1, user_id2 : 2, similarity : 2 },
{ user_id1 : 1, user_id2 : 3, similarity : 1 },
{ user_id1 : 2, user_id2 : 3, similarity : 1 },

Edited

I’ve done it without map-reduce

def self.build_similarity_weight
  users_id = ProductView.all.distinct(:user_id).to_a
  users_id.each do |user_id|
    this_user_products = ProductView.all.where(user_id: user_id).distinct(:product_id).to_a

    other_users = users_id.map { |e| e } 
    other_users.delete_if { |x| x == user_id }

    other_users.each do |other_uid|
      other_user_products = ProductView.all.where(user_id: other_uid).distinct(:product_id).to_a
      user_sim = (other_user_products & this_user_products).length
      usw = UserSimilarityWeight.new(user_id1: user_id, user_id2: other_uid, weight: user_sim)
      usw.save
    end
  end
end

The problem is my code is not efficient, O(n²), where n is number of users.
How can I make my code more efficient using map-reduce?

Regards,

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T04:58:35+00:00

First, you do 2 mapreduces.

- map: Omit product_id as key and user_id as value
- reduce: iterate with loop within a loop the value list (list of user ids for each product) and omit as key pair of user ids (where the smallest user id is the first one) and value 1
(working on the result of the first map reduce)
- map: just pass the pair of users as key and the value of 1 as value
- reduce: sum the value for each pair.

Second, you can’t be more efficient than O(n2) because your result is of order of O(n2).
Meaning, even if in some magically way, you will get the pairs and the similarity, you still need to write n^2 of pairs.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

My collection contains: { user_id : 1, product_id : 1 }, { user_id :

Edited

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply