I get 25 sample pieces from an image, get their average rgb values and save them in 5×5 Color arrays. These are my “signatures”. Values in signatures are like following:
Color signature[5][5];
-21233 -1 -323211 ... ...
-123 -12323 ...
...
I can reach red, blue and green values from indexes of the signature. I use these values to compare two images’ signatures and get a “difference” value.
signature[1][1].getBlue() = 123, Color[1][1].getRed() = 200 ..
for (int x = 0; x < 5; x++) {
for (int y = 0;y < 5; y++) {
int r1 = signature[x][y].getRed();
int g1 = signature[x][y].getGreen();
int b1 = signature[x][y].getBlue();
int r2 = signature2[x][y].getRed();
int g2 = signature2[x][y].getGreen();
int b2 = signature2[x][y].getBlue();
double tempDiff = Math.sqrt((r1 - r2) * (r1 - r2)
+ (g1 - g2) * (g1 - g2)
+ (b1 - b2) * (b1 - b2));
difference += tempDiff;
}
}
I also got a second signature for images, showing their edge-found version’s signatures. Comparing two images, I multiply normal-signature difference with edge-signature difference and get the final difference value.
Everything works great in comparing two images. However, I got lots of images, so I saved my signatures in database like following:
Table images:
-COLUMN name- -COLUMN signature- -COLUMN edge signature-
myimg.jpg |-12312 -132 -2 ... (25 of them) |-123 -1 -1234 -6921 .. (25 of them)|
I simply concatenate signature indexes with spaces between them and save as String.
Here’s my question: I need to find similarities for one image. If I select all images from the database, things get really slow and I’m out of memory. I can select 1000’s of images from database, compare and get the next 1000, but this is even slower.
I need a way to compare the image signatures in the query, I am ready to change my table’s columns, even ready to try insane tables with 100 columns holding all RGB values of the signature. I need reducing or hashing of the signature. It there any ways/approaches, links or libraries can you suggest? Any help would be appreciated.
If needed, I use Java on NetBeans, working with MySQL.
After seeing that we need 150 columns, two approaches came into mind:
However, after an ugly and messy implementation, the code worked just fine. What I’m doing is simply doing the calculation in the question with an SQL query and getting the most similar 50 pictures from the database. After I got the results I tidied up the code a little bit, and it’s working fine and fast.
So we saw no real need to implement the approaches above as they reduce the success of finding similarities and we don’t need to be faster. We get the best 50 results, so memory complexity isn’t a problem, too.
For all those have speed or memory problems in “Java part” (or any other “code” part) in the project, I strongly recommend to transfer as much work as it can get to “Database part” and get the things done with queries.