Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4321962
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 21, 20262026-05-21T08:48:47+00:00 2026-05-21T08:48:47+00:00

Given that I have the following in my knowledge-database: 1 0 6 20 0

  • 0

Given that I have the following in my knowledge-database:

1  0  6 20  0  0  6 20
1  0  3  6  0  0  3  6
1  0 15 45  0  0 15 45
1  0 17 44  0  0 17 44
1  0  2  5  0  0  2  5

I want to be able to find the nearest neighbors of the following vector:

1  0  5 16  0  0  5 16

according to a distance metric. So in this case, given a particular threshold, I should find that the first vector listed is a near-neighbor to the given vector. Currently, the size of my knowledge database is in the order of millions so calculating the distance metric for each and every point and then comparing is proving expensive. Are there any alternatives on how to achieve this with a significant speedup?

I am open to pretty much any approach including using spatial indexes in MySQL (except that I am not entirely sure on how this problem can be solved) or some kind of hashing (this would be great but again, I am not entirely sure).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-21T08:48:47+00:00Added an answer on May 21, 2026 at 8:48 am

    In Python (from http://www.comp.mq.edu.au/):

    def count_different_values(k_v1s, k_v2s):
        """kv1s and kv2s should be dictionaries mapping keys to 
        values.  count_different_values() returns the number of keys in 
        k_v1s and k_v2s that don't have the same value"""
        ks = set(k_v1s.iterkeys()) | set(k_v2s.iterkeys())
        return sum(1 for k in ks if k_v1s.get(k) != k_v2s.get(k))
    
    
    def sum_square_diffs(x0s, x1s):
        """x1s and x2s should be equal-lengthed sequences of numbers.
        sum_square_differences() returns the sum of the squared differences 
        of x1s and x2s."""
        sum((pow(x1-x2,2) for x1,x2 in zip(x1s,x2s)))
    
    def incr(x_c, x, inc=1):
        """increments the value associated with key x in dictionary x_c
        by inc, or sets it to inc if key x is not in dictionary x_c."""
        x_c[x] = x_c.get(x, 0) + inc
    
    def count_items(xs, x_c=None):
        """returns a dictionary x_c whose keys are the items in xs, and 
        whose values are the number of times each item occurs in xs."""
        if x_c == None:
            x_c = {}
        for x in xs:
            incr(x_c, x)
        return x_c
    
    def second(xy):
        """returns the second element in a sequence"""
        return xy[1]
    
    def most_frequent(xs):
        """returns the most frequent item in xs"""
        x_c = count_items(xs)
        return sorted(x_c.iteritems(), key=second, reverse=True)[0][0]
    
    
    class kNN_classifier:
        """This is a k-nearest-neighbour classifer."""
        def __init__(self, train_data, k, distf):
            self.train_data = train_data
            self.k = min(k, len(train_data))
            self.distf = distf
    
        def classify(self, x):
            Ns = sorted(self.train_data, 
                        key=lambda xy: self.distf(xy[0], x))
            return most_frequent((y for x,y in Ns[:self.k]))
    
        def batch_classify(self, xs):
            return [self.classify(x) for x in xs]
    
    def train(train_data, k=1, distf=count_different_values):
        """Returns a kNN_classifer that contains the data, the number of
        nearest neighbours k and the distance function"""
        return kNN_classifier(train_data, k, distf)
    

    also another implementation of http://www.umanitoba.ca/

    #!/usr/bin/env python
    # This code is part of the Biopython distribution and governed by its
    # license.  Please see the LICENSE file that should have been included
    # as part of this package.
    """
    This module provides code for doing k-nearest-neighbors classification.
    
    k Nearest Neighbors is a supervised learning algorithm that classifies
    a new observation based the classes in its surrounding neighborhood.
    
    Glossary:
    distance   The distance between two points in the feature space.
    weight     The importance given to each point for classification. 
    
    
    Classes:
    kNN           Holds information for a nearest neighbors classifier.
    
    
    Functions:
    train        Train a new kNN classifier.
    calculate    Calculate the probabilities of each class, given an observation.
    classify     Classify an observation into a class.
    
        Weighting Functions:
    equal_weight    Every example is given a weight of 1.
    
    """
    
    import numpy
    
    class kNN:
        """Holds information necessary to do nearest neighbors classification.
    
        Members:
        classes  Set of the possible classes.
        xs       List of the neighbors.
        ys       List of the classes that the neighbors belong to.
        k        Number of neighbors to look at.
    
        """
        def __init__(self):
            """kNN()"""
            self.classes = set()
            self.xs = []
            self.ys = []
            self.k = None
    
    def equal_weight(x, y):
        """equal_weight(x, y) -> 1"""
        # everything gets 1 vote
        return 1
    
    def train(xs, ys, k, typecode=None):
        """train(xs, ys, k) -> kNN
    
        Train a k nearest neighbors classifier on a training set.  xs is a
        list of observations and ys is a list of the class assignments.
        Thus, xs and ys should contain the same number of elements.  k is
        the number of neighbors that should be examined when doing the
        classification.
    
        """
        knn = kNN()
        knn.classes = set(ys)
        knn.xs = numpy.asarray(xs, typecode)
        knn.ys = ys
        knn.k = k
        return knn
    
    def calculate(knn, x, weight_fn=equal_weight, distance_fn=None):
        """calculate(knn, x[, weight_fn][, distance_fn]) -> weight dict
    
        Calculate the probability for each class.  knn is a kNN object.  x
        is the observed data.  weight_fn is an optional function that
        takes x and a training example, and returns a weight.  distance_fn
        is an optional function that takes two points and returns the
        distance between them.  If distance_fn is None (the default), the
        Euclidean distance is used.  Returns a dictionary of the class to
        the weight given to the class.
    
        """
        x = numpy.asarray(x)
    
        order = []  # list of (distance, index)
        if distance_fn:
            for i in range(len(knn.xs)):
                dist = distance_fn(x, knn.xs[i])
                order.append((dist, i))
        else:
            # Default: Use a fast implementation of the Euclidean distance
            temp = numpy.zeros(len(x))
            # Predefining temp allows reuse of this array, making this
            # function about twice as fast.
            for i in range(len(knn.xs)):
                temp[:] = x - knn.xs[i]
                dist = numpy.sqrt(numpy.dot(temp,temp))
                order.append((dist, i))
        order.sort()
    
        # first 'k' are the ones I want.
        weights = {}  # class -> number of votes
        for k in knn.classes:
            weights[k] = 0.0
        for dist, i in order[:knn.k]:
            klass = knn.ys[i]
            weights[klass] = weights[klass] + weight_fn(x, knn.xs[i])
    
        return weights
    
    def classify(knn, x, weight_fn=equal_weight, distance_fn=None):
        """classify(knn, x[, weight_fn][, distance_fn]) -> class
    
        Classify an observation into a class.  If not specified, weight_fn will
        give all neighbors equal weight.  distance_fn is an optional function
        that takes two points and returns the distance between them.  If
        distance_fn is None (the default), the Euclidean distance is used.
        """
        weights = calculate(
            knn, x, weight_fn=weight_fn, distance_fn=distance_fn)
    
        most_class = None
        most_weight = None
        for klass, weight in weights.items():
            if most_class is None or weight > most_weight:
                most_class = klass
                most_weight = weight
        return most_class
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Hi I have the following problem. Given a Tree I want to be able
Given that I have the following WCF service: class LookUpService { public List<County> GetCounties(string
I have the following bit of code, I expect that given cstdio is included
I have the following perl script that works locally given the input parameters. I
Given that I have the following widget (and a view attached to it): public
Given that we have the following View: <StackPanel> <TextBlock Text=Some text/> <Image Source={Binding vmImageProp}
I have a following problem I want to solve efficiently. I am given a
Hi I have following sql statement that gives me the the first_name and total
Imagine I have the following: inFile = /adda/adas/sdas/hello.txt # that instruction give me hello.txt
I have a function that gives me the following warning: [DCC Warning] filename.pas(6939): W1035

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.