I am writing program that does alot of table lookups. As such, I was perusing the Haskell documentation when I stumbled upon Data.Map (of course), but also Data.HashMap and Data.Hashtable. I am no expert on hashing algorithms and after inspecting the packages they all seem really similar. As such I was wondering:
1: what are the major differences, if any?
2: Which would be the most performant with a high volume of lookups on maps/tables of ~4000 key-value pairs?
Data.Map.Mapis a balanced binary tree internally, so its time complexity for lookups is O(log n). I believe it’s a “persistent” data structure, meaning it’s implemented such that mutative operations yield a new copy with only the relevant parts of the structure updated.Data.HashMap.Mapis aData.IntMap.IntMapinternally, which in turn is implemented as Patricia tree; its time complexity for lookups is O(min(n, W)) where W is the number of bits in an integer. It is also “persistent.”. New versions (>= 0.2) use hash array mapped tries. According to the documentation: “Many operations have a average-case complexity of O(log n). The implementation uses a large base (i.e. 16) so in practice these operations are constant time.”Data.HashTable.HashTableis an actual hash table, with time complexity O(1) for lookups. However, it is a mutable data structure — operations are done in-place — so you’re stuck in theIOmonad if you want to use it.The best answer I can give you, unfortunately, is “it depends.” If you take the asymptotic complexities literally, you get O(log 4000) = about 12 for
Data.Map, O(min(4000, 64)) = 64 forData.HashMapand O(1) = 1 forData.HashTable. But it doesn’t really work that way… You have to try them in the context of your code.