I’m experimenting with an idea, where I have following subproblem: I have a list

Question

0

Editorial Team

Asked: June 14, 20262026-06-14T09:23:07+00:00 2026-06-14T09:23:07+00:00

I’m experimenting with an idea, where I have following subproblem: I have a list

0

I’m experimenting with an idea, where I have following subproblem:

I have a list of size m containing tuples of fixed length n.

[(e11, e12, .., e1n), (e21, e22, .., e2n), ..., (em1, em2, .., emn)]

Now, given some random tuple (t1, t2, .., tn), which does not belong to the list, I want to find the closest tuple(s), that belongs to the list.

I use the following distance function (Hamming distance):

def distance(A, B):
    total = 0
    for e1, e2 in zip(A, B):
        total += e1 == e2
    return total

One option is to use exhaustive search, but this is not sufficient for my problem as the lists are quite large. Other idea, I have come up with, is to first use kmedoids to cluster the list and retrieve K medoids (cluster centers). For querying, I can determine the closest cluster with K calls to distance function. Then, I can search for the closest tuple from that particular cluster. I think it should work, but I am not completely sure, if it is fine in cases the query tuple is on the edges of the clusters.

However, I was wondering, if you have a better idea to solve the problem as my mind is completely blank at the moment. However, I have a strong feeling that there may be a clever way to do it.

Solutions that require precomputing something are fine as long as they bring down the complexity of the query.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T09:23:09+00:00

You can store a hash table (dictionary/map) that maps from an element (in the tupple) to the tupples it appears in: hash:element->list<tupple>.

Now, when you have a new “query”, you will need to iterate each of hash(element) for each element of the new “query”, and find the maximal number of hits.

pseudo code:

findMax(tuple):
  histogram <- empty map  
  for each element in tuple:
     #assuming hash_table is the described DS from above
     for each x in hash_table[element]: 
         histogram[x]++ #assuming lazy initialization to 0
  return key with highest value in histogram

An alternative, that does not exactly follow the metric you desired is a k-d tree. The difference is k-d tree also take into consideration the “distance” between the elements (and not only equality/inequality).

k-d trees also require the elements to be comparable.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m experimenting with an idea, where I have following subproblem: I have a list

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply