I have users and resources. Each resource is described by a set of features

Question

0

Asked: May 24, 20262026-05-24T16:49:38+00:00 2026-05-24T16:49:38+00:00

I have users and resources. Each resource is described by a set of features

0

I have users and resources. Each resource is described by a set of features and each user is related to a different set of resources. In my particular case, the resources are web pages, and the features information about the location of the visit, the time of the visit, the number of visit etc, which are tied to a specific user each time.

I want to get a similarity measure between my users regarding those features but I can’t find a way to aggregate the resource features together. I’ve done it with text features, as it is possible to add the documents together and then extract features (say TF-IDF), but I don’t know how to proceed with this configuration.

To be as clear as possible, here is what I have:

>>> len(user_features)
13 # that's my number of users
>>> user_features[0].shape
(2374, 17) # 2374 documents for this user, and 17 features

I’m able to get a similarity matrix of the documents using euclidean distances for instance:

>>> euclidean_distance(user_features[0], user_features[0])

But I don’t know how do I compare the users against each other. I should somehow aggregate the features together to end up with a N_Users X N_Features matrix, but I don’t know how.

Any hints on how to proceed?

Some more information about the features I’m using:

The features I have here are not completely fixed. What I’ve got so far is 13 different features, already aggregated from “views”. What I have is standard deviation, mean, etc. for each of the views, in order to have something “flat”, to be able to compare them. One of the feature I have is: was the location changed since the last view? And what about one hour ago? Two hours ago?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T16:49:38+00:00

If each user is represented as a set of document-interaction vectors you can define the similarity of a pair of users as the similarity of the pair of document-interaction vector sets that represent the users.

You say you can get a similarity matrix of the documents. Then assume that user U1 visited documents D1, D2, D3, and user U2 visited documents D1,D3,D4. You would have two sets of vectors S1 = {U1(D1), U1(D2), U1(D3)} for user 1 and S2 = {U2(D1), U2(D3), U2(D4)}. Note that because each user’s interaction with a document is different they are represented as such. If I understand correctly, the elements of these sets should correspond to the respective lines in the matrix of each user.

The similarity between these two sets can be computed in many different ways. One option is the average pair-wise similarity: You iterate over all pairings of the elements from each set, compute the document similarity of the pair, and average over all pairs.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have users and resources. Each resource is described by a set of features

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply