Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6846453
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T00:37:27+00:00 2026-05-27T00:37:27+00:00

Im pretty much new to data mining and recommendation systems, now trying to build

  • 0

Im pretty much new to data mining and recommendation systems, now trying to build some kind of rec system for users that have such parameters:

  • city
  • education
  • interest

To calculate similarity between them im gonna apply cosine similarity and discrete similarity.
For example:

  • city : if x = y then d(x,y) = 0. Otherwise, d(x,y) = 1.
  • education : here i will use cosine similarity as words appear in the name of the department or bachelors degree
  • interest : there will be hardcoded number of interest user can choose and cosine similarity will be calculated based on two vectors like this:
1 0 0 1 0 0 ... n
1 1 1 0 1 0 ... n

where 1 means the presence of the interest and n is the total number of all interests.

My question is:
How to combine those 3 similarities in appropriate order? I mean just summing them doesnt sound quite smart, does it? Also I would like to hear comments on my “newbie similarity system”, hah.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T00:37:28+00:00Added an answer on May 27, 2026 at 12:37 am

    There are not hard-and-fast answers, since the answers here depend greatly on your input and problem domain. A lot of the work of machine learning is the art (not science) of preparing your input, for this reason. I could give you some general ideas to think about. You have two issues: making meaningful similarities out of each of these items, and then combining them.

    The city similarity sounds reasonable but really depends on your domain. Is it really the case that being in the same city means everything, and being in neighboring cities means nothing? For example does being in similarly-sized cities count for anything? In the same state? If they do your similarity should reflect that.

    Education: I understand why you might use cosine similarity but that is not going to address the real problem here, which is handling different tokens that mean the same thing. You need “eng” and “engineering” to match, and “ba” and “bachelors”, things like that. Once you prepare the tokens that way it might give good results.

    Interest: I don’t think cosine will be the best choice here, try a simple tanimoto coefficient similarity (just size of intersection over size of union).

    You can’t just sum them, as I assume you still want a value in the range [0,1]. You could average them. That makes the assumption that the output of each of these are directly comparable, that they’re the same “units” if you will. They aren’t here; for example it’s not as if they are probabilities.

    It might still work OK in practice to average them, perhaps with weights. For example, being in the same city here is as important as having exactly the same interests. Is that true or should it be less important?

    You can try and test different variations and weights as hopefully you have some scheme for testing against historical data. I would point you at our project, Mahout, as it has a complete framework for recommenders and evaluation.

    However all these sorts of solutions are hacky and heuristic. I think you might want to take a more formal approach to feature encoding and similarities. If you’re willing to buy a book and like Mahout, Mahout in Action has good coverage in the clustering chapters on how to select and encode features and then how to make one similarity out of them.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am pretty much new in this. I know Remoting, HTTPService and WebService. I
I am pretty much new to using SQL Server 2000. I have been using
Please allow me to preface this by saying I'm pretty much brand new to
I am completely new to android, and pretty much a Java newb. I have
Pretty much what the title says really. We have some code that is .NET
hi i am currently learning code igniter i am pretty much new to this
I'm trying to get some simple data into a JsonStore, but it doesn't seem
I am a fairly new to ROR. I'm trying to build my first web
I have some code that is a result of pretty much my first effort
I have started practicing TDD approach. I am pretty much new to unit testing.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.