I have a sample image which contains an object, such as the earrings in the following image:
https://i.stack.imgur.com/N5w9a.jpg
I then have a large candidate set of images for which I need to determine which one most likely contains the object, e.g.:
https://i.stack.imgur.com/xYL90.jpg
So I need to produce a score for each image, where the highest score corresponds to the image which most likely contains the target object. Now, in this case, I have the following conditions/constraints to work with/around:
1) I can obtain multiple sample images at different angles.
2) The sample images are likely to be at different resolutions, angles, and distances than the candidate images.
3) There are a LOT of candidate images (> 10,000), so it must be reasonably fast.
4) I’m willing to sacrifice some precision for speed, so if it means we have to search through the top 100 instead of just the top 10, that’s fine and can be done manually.
5) I can manipulate the sample images manually, such as outlining the object that I wish to detect; the candidate images cannot be manipulated manually as there are too many.
6) I have no real background in OpenCV or computer vision at all, so I’m starting from scratch here.
My initial thought is to start by drawing a rough outline around the object in the sample image. Then, I could identify corners in the object and corners in the candidate image. I could profile the pixels around each corner to see if they look similar and then rank by the sum of the maximum similarity scores of every corner. I’m also not sure how to quantify similar pixels. I guess just the Euclidean distance of their RGB values?
The problem there is that it kind of ignores the center of the object. In the above examples, if the corners of the earrings are all near the gold frame, then it would not consider the red, green, and blue stones inside the earring. I suppose I could improve this by then looking at all pairs of corners and determining similarity by sampling some points along the line between them.
So I have a few questions:
A) Does this line of thinking make sense in general or is there something I’m missing?
B) Which specific algorithms from OpenCV should I investigate using? I’m aware that there are multiple corner detection algorithms, but I only need one and if the differences are all optimizing on the margins then I’m fine with the fastest.
C) Any example code using the algorithms that would be helpful to aid in my understanding?
My options for languages are either Python or C#.
Check out the SURF features, which are a part of openCV. The idea here is that you have an algorithm for finding “interest points” in two images. You also have an algorithm for computing a descriptor of an image patch around each interest point. Typically this descriptor captures the distribution of edge orientations in the patch. Then you try to find point correspondences, i. e. for each interest point in image A try to find a corresponding interest point in image B. This is accomplished by comparing the descriptors, and looking for the closest matches. Then, if you have a set of correspondences that are related by some geometric transformation, you have a detection.
Of course, this is a very high level explanation. The devil is in the details, and for those you should read some papers. Start with Distinctive image features from scale-invariant keypoints by David Lowe, and then read the papers on SURF.
Also, consider moving this question to Signal and Image Processing Stack Exchange