I’m curious why the result of SHA256 can be saved within a binary(32), but it needs a varchar(64) for the same result to be saved.
I mean, 256 bits are 32 bytes, so, saving inside a binary(32) makes perfect sense. But then why trying to save it in a varchar requires an extra byte for each byte?
Let’s start at the beginning and see what a cryptographic function is and what it’s output actually is:
That means that we obtain sequence of 1s and 0s back. In order to save that sequence correctly, you have to use MySQL’s
binarydata-type column since it doesn’t save any data about how to represent the saved data to the user – there is no encoding associated with it. That means that when you try to view the data, you’ll most likely see garbled characters since the GUI programs will attempt to represent the value stored as an ASCII-encoded string (which is wrong).I’ll skip the reasons why the hash value is represented as a number, but the point is that it is. And it’s a hexadecimal number. Let’s take the 1st byte that you used:
10101111= that’s decimal175or hexadecimalAF.Sure, you can represent ASCII
175as something, it will most likely be a weird character depending on the codepage being used. Problem with ASCII is that codes above 127 are arbitrary, which lead to inventing codepages, which lead to inventing Unicode etc. so I’ll skip it for now.Point is, you can’t rely on ASCII displaying
10101111correctly in every scenario.That means that
175will have to be displayed using 3 bytes, not 1. Why? Because each character in175has to be displayed using its own byte.That means that you can display your hash value as decimal number. That also means you can display your number as a hexadecimal number, which is significantly shorter to represent.
Let’s take
10101111again.In decimal it’s
175, takes 3 bytes to show it on screen – 1 for1, 1 for7and 1 for5.In hexadecimal it’s
AF, takes 2 bytes to show it on screen – significantly shorter.Each byte when translated to hex number has at least 2 digits (there are leading zeroes). With decimal numbers that’s not the case, so you know that every time you want to represent 1 byte as a hex number – you’d have at least 2 digits. Ergo, your message is fixed width, it uses digits 0-9, letters A-F which are at the same position in every ASCII code page, ergo they’ll look the same.
So when you take
AFand display it in ASCII, you need 1 byte forAand 1 byte forF.There are 32 numbers, each has 2 digits, 32×2 = 64 bytes.
The only mistake you probably did was using
varchar(64). Using varchar for storing hashes is useless, if you know the hash width. Usingcharwould be much better because you wouldn’t waste that 1 byte thatvarcharcolumn uses.Hopefully, this clears it up a bit. It’s actually more simple than it sounds 🙂