So I wrote a Unit Test to compare cropped images (using imagemagick) in PHP. The test works, but i’ve been running into problems when it comes to comparing a large number of images at a time. Depending on the time the image is created at, each image receives a timestamp that is embedded directly into the raw data. I’ve been using a regular expression to pull out that timestamp right before comparing the files but it appears as though every once in a while, one of the image files will have additional raw data in it even though they’re exactly the same.
To give an example, here’s the result from one of my tests (note, i’m comparing the binary data of the images as a string):
ImageTest::testAutoCrop
Failed asserting that two strings are equal.
— Expected
+++ Actual
@@ @@
?n??m?
-?F sO=f??????????^???????w??>
?(???/o????M)???o%tEXt??%tEXt+?F sO=f??????????^???????w??>
?(???/o????M)???o%tEXt
As you can see….the only difference between these two files is that the expected image has this additional string in it: "?%tEXt".
Can someone help me understand what this random piece of data represents? That will help me figure out how to modify my unit test so that issues like this won’t happen anymore.
Thanks,
Malcolm
PS: Please let me know if I need to provide more information.
So I eventually came up with a solution to this issue. Couple things to clarify:
The reason why I was making unit tests is because our imageservice web application ( PHP ) uses Imagemagick to handle all image processing, manipulation , conversion of HTML to image, and PDF to image ( jpg,png,gif, all non cmyk, pdf ) conversions that happen on our main website. Needed to make sure that as we added new features to this image service application, there were enough tests in place to ensure that everything still functioned correctly.
The string data that we saw in each image ( aka: ?%tEXt ) is the image’s exif data. ( http://en.wikipedia.org/wiki/Exchangeable_image_file_format ) in order to compare pictures ( suggestion taken from David Andersson’s reply ( https://stackoverflow.com/users/904933/david-andersson ) we needed to completely strip all comment data out of the image along with the creation date time stamp / modified on info. That way you’re dealing with simply an image and no other type of meta data. We did that with the following function:
This was run on each image before comparing them to each other ( in String format ). Hopefully this helps someone in the future who might be doing something similar.
I plan on writing a blog post about this in more detail to show how I took care of a number of other tests. When I do I will update this question with the link in either the comments or this answer. Hope this helps someone.