I’ve got a really big problem with my image storage server. There are about

Question

0

Asked: May 20, 20262026-05-20T12:33:58+00:00 2026-05-20T12:33:58+00:00

I’ve got a really big problem with my image storage server. There are about

0

I’ve got a really big problem with my image storage server.

There are about 2,000,000 product images on it and keep increasing, but a lots of them are very similar. For example: an iPad photo with many similar sizes 120 * 120, 118 * 120, 131 * 125 … etc. they took a lots of unnecessary disk space and bad user experience in my website (similar images in gallery).

Those images has indexed in database, I can find them with some conditions, like by product, category etc. I need to find a way to mark these similar images in database and remove them.

What I have done:
found a library named pHash can calculate two image’s similarity, I can use it calculate images one by one. But in this way it will take a lots of time to find those images. Now I don’t know how to make this process be more faster.

Any ideas?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T12:33:59+00:00

Editorial Team

2026-05-20T12:33:59+00:00Added an answer on May 20, 2026 at 12:33 pm

Use pHash to calculate the perceptual hash of all your images (not of the crossproduct of each combination),
then sort that hash (while keeping the reference to the images),
then define a critical value of that perceptual hash that you define as “the pictures are equivalent”,
then replace references to equivalent pictures with the reference to the one picture you want to keep.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve got a really big problem with my image storage server. There are about

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply