We have a set of Documents , each has a set of Features. Given

Question

0

Asked: May 17, 20262026-05-17T16:58:50+00:00 2026-05-17T16:58:50+00:00

We have a set of Documents , each has a set of Features. Given

0

We have a set of Documents , each has a set of Features.
Given feature A, we need to know what is the probability of having feature B in the same document.

I thought of building a probability matrix , s.t:
M(i,j) = Probability of having feature B in a document , given that feature A is there.

However , we have an additional requirement:
Given feature A is in the document , what are all the features that have a probability > P of being in the same document.

In the mean while all I could think off is a sparse matrix for the Probability matrix , and after it’s computed , for each feature run over all the column , sort it by P , and keep it in a linked list somewhere. (So now , we have for each feature , a list of corresponding features

This space complexity is quite big (worst case: N^2, and N is large!) , and the time complexity for each search is O(N).

Any better idea?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T16:58:51+00:00

If the number of features is comparable with the number of documents, or greater, consider holding an inverted index: for each feature hold (e.g. a sorted list of) the documents in which it is present. You can then work out the probability of B given A by running a merge on the sorted lists for features A and B.

For the “common features expected given A” question, I can think of nothing better than pre-computing the answer for each A and hoping that the resulting list of features isn’t too long.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

We have a set of Documents , each has a set of Features. Given

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply