Why one languages uses tree and another uses hash table for seemingly similar data structure?
c++’s map vs python’s dict
A related question is about performance of hash table.
Please comment on my understanding of hash table below.
A tree is guaranteed to have O(log n).
Whereas hash table has no guarantee unless inputs are previously known because of possible collisions.
I tend to think hash table’s performance would become close to O(n) as problem size gets bigger.
Because I haven’t heard of a hash function that dynamically adjust its table size as problem size grows.
Hence, hash table is only useful for certain range of problem size, and that’s why most DB uses tree than hash table.
The new C++ standard has the
std::unordered_maptype which is a hash table. IIRC they wanted it to get into the previous standard as well, but there was not enough time during the discussions so it was left out. However, most popular compilers provided it in one way or another for years.In other words, don’t worry about it too much. Use the proper data structure for the task at hand.
As for your understanding of hash tables, it’s inaccurate:
All serious hash table implementation dynamically adjust themselves for growing input, by allocating a larger array and re-hashing all the keys. Although this operation is expensive, if designed properly (to be done rarely enough) the performance is still amortized O(1).