I am interested in knowing why triangle law is so important for a better data mining.As far as I know the triangle law helps us to define patterns and form clusters based on the distances between different objects.Does anyone have any other inputs for triangle law?
Share
It is actually not that important. In data mining, we cannot generally assume to have a proper “mathematical” distance function. As soon as we allow duplicates, we already lose one of the key axioms – we can have two different objects with the distance 0. (And in classification, they may even have different classes in the worst case).
However, the triangle inequality can allow us to prune the search space. If we have a distance function that satisfies triangle inequality and use an appropriate index, we can skip a lot of computations, thus making the algorithm faster.
Note that a lot of research and implementations do not so much care about this kind of optimization. Many data miners working with R like building a distance matrix (which is in
O(n^2)!) and then try to do as much as possible with matrix operations, because that is simple to program and R is quite fast at this kind of operations (using a highly optimized C code, instead of interpreted R code). But if you need to go beyond this, a key ingredient for performance is to exploit triangle inequality where possible.