I have a python program that is going to eat a lot of memory, primarily in a dict. This dict will be responsible for assigning a unique integer value to a very large set of keys. As I am working with large matrices, I need a key-to-index correspondence that can also be recovered from (i.e., once matrix computations are complete, I need to map the values back to the original keys).
I believe this amount will eventually surpass available memory. I am wondering how this will be handled with regards to swap space. Perhaps there is a better data structure for this purpose.
It will just end up in swap trashing, because a hash table has very much randomized memory access patterns.
If you know that the map exceeds the size of the physical memory, you could consider using a data structure on the disk in the first place. This especially if you don’t need the data structure during the computation. When the hash table triggers swapping, it creates problems also outside the hash table itself.