assuming we have:
- set U of n-dimensional vectors (vector v = < x1,x2 … ,xn >)
- constraint n-dimensional vector c = < x1…xn >
- n-dimensional vector of weights w = < x1…xn >
- integer S
i need algorithm that would select S vectors from U into set R while minimizing function cost(R)
cost(R) = sum(abs(c-sumVectors(R))*w)
(sumVectors is a function that sums all vectors like so: sumVectors({< 1,2 >; < 3 ,4>}) = < 4,6 > while sum(< 1, 2, 3 >) returns scalar 6)
The solution does not have to be optimal. I just need to get a best guess i can get in preset time.
Any idea where to start? (Preferably something faster/smarter than genetic algorithms)
This is an optimization problem. Since you don’t need the optimal solution, you can try the stochastic optimization method, e.g., Hill Climbing, in which you start with a random solution (a random subset of R) and look at the set of neighboring solutions (adding or removing one of the components of current solution) for those that are better with respective of the cost function.
To get better solution, you can also add Simulated Annealing to your hill climbing search. The idea is that in some cases, it’s necessary to move to a worse solution and then arrive at a better one later. Simulated Annealing works better because it allows a move for a worse solution near the beginning of the process. The algorithm becomes less likely to allow a worse solution as the process goes on.
I paste some sample hill climbing python code to solve your problem here:
https://gist.github.com/921f398d61ad351ac3d6
In my sample code,
Ralways holds a list of the index intoU, and I use euclidean distance to compare the similarity between neighbors. Certainly you can use other distance functions that satisfy your own needs. Also note in the code, I am getting neighbors on the fly. If you have a large pool of vectors inU, you might want to cache the pre-computed neighbors or even consider locality sensitive hashing to avoidO(n^2)comparison. Simulated Annealing can be added onto the above code.The result of one random run is shown below.
I use only 20 vectors in
U,S=10, so that I can compare the result with an optimal solution.The hill climbing process stops at the 4th step when there is no better choice to move to with replacing only one k-nearest-neighbors.
I also run with an exhaustive search which iterates all possible combinations. You can see that the hill-climbing result is pretty good compared with the exhaustive approach. It takes only 4 steps to get the relatively small cost (a local minimum though) which takes the exhaustive search more than 82K steps to beat it.