So I figured out I can convert an image to grayscale like this:
public static Bitmap GrayScale(this Image img)
{
var bmp = new Bitmap(img.Width, img.Height);
using(var g = Graphics.FromImage(bmp))
{
var colorMatrix = new ColorMatrix(
new[]
{
new[] {.30f, .30f, .30f, 0, 0},
new[] {.59f, .59f, .59f, 0, 0},
new[] {.11f, .11f, .11f, 0, 0},
new[] {0, 0, 0, 1.0f, 0},
new[] {0, 0, 0, 0, 1.0f}
});
using(var attrs = new ImageAttributes())
{
attrs.SetColorMatrix(colorMatrix);
g.DrawImage(img, new Rectangle(0, 0, img.Width, img.Height),
0, 0, img.Width, img.Height, GraphicsUnit.Pixel, attrs);
}
}
return bmp;
}
Now, I want to compute the average “direction” of the pixels.
What I mean by that is that I want to look at say a 3×3 region, and then if the left side is darker than the right side, then the direction would be to the right, if the bottom is darker than the top, then the direction would be upwards, if the bottom-left is darker than the top-right, then the direction would be up-right. (Think of little vector arrows over every 3×3 region). Perhaps a better example is if you draw a grayscale gradient in photoshop, and you want to compute at what angle they drew it.
I’ve done stuff like this MatLab, but that was years ago. I figure I could use a matrix similar to ColorMatrix to compute this, but I’m not quite sure how. It looks like this function might be what I want; could I convert it to grayscale (as above) and then do something with the grayscale matrix to compute these directions?
IIRC, what I want is quite similar to edge detection.
After I compute these direction vectors, I’m just going to loop over them and compute the average direction of the image.
The end goal is I want to rotate images so that their average direction is always upwards; this way if I have two identical images except one is rotated (90,180 or 270 degrees), they will end up oriented the same way (I’m not concerned if a person ends up upside down).
*snip* Deleting some spam. You can view the revisions of you want to read the rest of my attempts.
Calculating the mean of angles is generally a bad idea:
The mean of a set of angles has no clear meaning: For example, the mean of one angle pointing up and one angle pointing down is a angle pointing right. Is that what you want? Assuming “up” is +PI, then the mean between two angles almost pointing up would be an angle pointing down, if one angle is PI-[some small value], the other -PI+[some small value]. That’s probably not what you want. Also, you’re completely ignoring the strength of the edge – most of the pixels in your real-life images aren’t edges at all, so the gradient direction is mostly noise.
If you want to calculate something like an “average direction”, you need to add up vectors instead of angles, then calculate Atan2 after the loop. Problem is: That vector sum tells you nothing about objects inside the image, as gradients pointing in opposite directions cancel each other out. It only tells you something about the difference in brightness between the first/last row and first/last column of the image. That’s probably not what you want.
I think the simplest way to orient images is to create an angle histogram: Create an array with (e.g.) 360 bins for 360° of gradient directions. Then calculate the gradient angle and magnitude for each pixel. Add each gradient magnitude to the right angle-bin. This won’t give you a single angle, but an angle-histogram, which can then be used to orient two images to each other using simple cyclic correlation.
Here’s a proof-of-concept Mathematica implementation I’ve thrown together to see if this would work:
Results with sample images:
The angle histograms also show why the mean angle can’t work: The histogram is essentially a single sharp peak, the other angles are roughly uniform. The mean of this histogram will always be dominated by the uniform “background noise”. That’s why you’ve got almost the same angle (about 180°) for each of the “real live” images with your current algorithm.
The tree image has a single dominant angle (the horizon), so in this case, you could use the mode of the histogram (the most frequent angle). But that will not work for every image:
Here you have two peaks. Cyclic correlation should still orient two images to each other, but simply using the mode is probably not enough.
Also note that the peak in the angle histogram is not “up”: In the tree image above, the peak in the angle histogram is probably the horizon. So it’s pointing up. In the Lena image, it’s the vertical white bar in the background – so it’s pointing to the right. Simply orienting the images using the most frequent angle will not turn every image with the right side pointing up.
This image has even more peaks: Using the mode (or, probably, any single angle) would be unreliable to orient this image. But the angle histogram as a whole should still give you a reliable orientation.
Note: I didn’t pre-process the images, I didn’t try gradient operators at different scales, I didn’t post-process the resulting histogram. In a real-world application, you would tweak all these things to get the best possible algorithm for a large set of test images. This is just a quick test to see if the idea could work at all.
Add: To orient two images using this histogram, you would
For example, in C#:
(you could speed this up using FFT if your histogram length is a power of two. But the code would be a lot more complex, and for 256 bins, it might not matter that much)