What mean-stat-equation should I use when I have an image with N-number sample-size of selections?
I have a unique problem for which i was hoping to get some advice, so that i don’t miss out on anything.
The Problem: To find the most favored/liked/important area on an image based on user selection of areas in different selection ratios.
Scenario: Consider an Image of a dog, and hundreds of users selecting area over this image in various resolutions, the obvious area of focus in most selections will be the area containing the dog. I can record the x1,x2,y1,y2 co-ordinates and put them into a db, now if i want to automatically generate versions of this image in a set of resolutions i should be able to recognize the area with the max attraction of the users.
The methods i think could work are:
- Find the average center point of all selections and base the selection in that. – Very simple but would not be as accurate.
- Use some algorithm like K Means or EM Clustering but i don’t know which one would be best suited.
Looking forward to some brilliant solution to my problem
More info on the problem:
The Actual image will be most probably be a 1024×768 image, and the selections made on it will be of the most common mobile phone resolutions. The objective is to automatically generate mobile phone wallpapers by intelligent learning based on user selections.
I believe that you have two distinct problems identified above:
ONE: Identification of Points
For this, you will need to develop some sort of heuristic for identifying whether a point should be considered or not.
I believe you mentioned that hundreds of users will be selection locations over this image? Hundreds may be a lot of points to cluster. Consider excluding outliers (by removing points which do not have a certain number of neighbors within a particular distance)
Anything you can do to reduce your dataset will be helpful.
TWO: Clustering of Points
I believe that K Means Clustering would be best suited for this particular problem.
LINK
Your particular problem seems to closely mirror the standard Cartesian coordinate clustering examples used in explaining this algorithm.
What you’re trying to do appears to be NP-Hard, but should be satisfied by the classical approximations.
Once clustered, you can take an average of the points within that cluster for a rather accurate approximation.
In Addition:
You dataset sounds like it will already be tightly clustered. (i.e. Most people will pick the dog’s face, not the side of it’s torso.) You need to be aware of local minima. LINK These can really throw a wrench into your algorithm. Especially with a small number of clusters. Be aware that you may need a bit of dynamic programming to combat this. You can usually introduce some variance into your algorithm, allowing the average points to “pop out” of these local minima. Local Minima/Maxima
Hope this helps!