I am generating the checksum (sha256) of an uploaded image in Ruby on Rails.
upload = params[:file]
data1 = upload.read
data2 = File.read(upload.tempfile)
checksum1 = Digest::SHA256.hexdigest(data1)
checksum2 = Digest::SHA256.hexdigest(data2)
puts checksum1
puts checksum2
Last two statements are returning different values.
checksum1 is generated by reading the data using the UploadedFile object.
checksum2 is generated by reading the temporary file from the file system.
Does an object of ActionDispatch::Http::UploadedFile return anything more than the contents of the uploaded file? When I generate the checksum of the uploaded file written to the file-system, it is matching with checksum2 (temporary file checksum) , not with checksum1 (UploadedFile.read).
I am assuming that the checksum generated by reading the temporary file from the filesystem is more reliable as the object (UploadedFile) implementation might change. If needed, it will be easier to generate checksums of existing files on the file system.
So, what is the reason for the difference of checksums and which one is more reliable?
Thank you.
Update 1:
As per @pablo-castellazzi suggestion i generated the hash by using Digest::SHA256.file(upload.path).hexdigest . Let us call this checksum3
This checksum3 equals checksum1 but differs from checksum2
Update 2: If i use the binary mode to read the file as mentioned by @Arsen7 , then all the checksums are equal.
Have you compared the ‘data1’ and ‘data2’ contents? Try to save them to files and take a look.
I suppose, you may want to call
upload.rewindbefore you do the first read, but the first thing is to take a look at the raw data read from the files.Update:
You did not say that you are on Windows. In this case you should take care and read the files in so-called ‘binary’ mode.
Change the
File.readmethod to something like this:(Implement Pablo Castellazzi suggestion of using
.pathmethod)I was suggesting that you open the files in some binary-safe editor (vim, for example) and compare what differs. You would notice that maybe most of the data is the same, but in one of the files line endings are different, or maybe you would spot some other differences.
In case of Windows, the most popular problem is the binary mode.