if a single image has been saved twice with two different filenames, is there a way to compare them to see if they’re the same..?
I’m hoping a basic hash or CRC type check could work..?
File size might not, as there are millions of images in the pool and different images could have the same size..
Hoping there’s an easy way to do it in C#..
If the file contents are identical, then a cryptographic hash would at least give a very good indication of equality. The
SHA256class would be a good candidate here, although it’s possibly a little over the top. For example:The simplest way to compare the two returned byte arrays is probably to convert them both to strings using
Convert.ToBase64and then compare the strings. Ugly but easy 🙂 You could also useEnumerable.SequenceEqual:If you want to store the hashes as a set or dictionary, you could implement your own
IEqualityComparer<byte[]>but frankly it would be easiest to use a base64 string. For example, this will print out the duplicate files:A few notes:
How you approach this depends on whether your goal is absolute speed, simplicity of code, etc. It may also depend on whether the pool will grow over time – for example, you may want to hash files as soon as you get two or more files of the same size, so that when you add another file of the same size you can hash that and add it without ever rereading the existing data.