In MySQL, an index type is a b-tree, and access an element in a b-tree is in logarithmic amortized time O(log(n)).
On the other hand, accessing an element in a hash table is in O(1).
Why is a hash table not used instead of a b-tree in order to access data inside a database?
You can only access elements by their primary key in a hashtable.
This is faster than with a tree algorithm (
O(1)instead oflog(n)), but you cannot select ranges (everything in betweenxandy).Tree algorithms support this in
Log(n)whereas hash indexes can result in a full table scanO(n).Also the constant overhead of hash indexes is usually bigger (which is no factor in theta notation, but it still exists).
Also tree algorithms are usually easier to maintain, grow with data, scale, etc.
Hash indexes work with pre-defined hash sizes, so you end up with some "buckets" where the objects are stored in. These objects are looped over again to really find the right one inside this partition.
So if you have small sizes you have a lot of overhead for small elements, big sizes result in further scanning.
Todays hash tables algorithms usually scale, but scaling can be inefficient.
However there may be a point where your index exceeds a tolerable size compared to your hash sizes and your entire index needs to be re-built. Usually this is not a problem, but for huge-huge-huge databases, this can take days.
The trade off for tree algorithms is small and they are suitable for almost every use case and thus are default.
However if you have a very precise use case and you know exactly what and only what is going to be needed, you can take advantage of hashing indexes.