This is bit of a general question but I wanted to know some different approaches to sharing data between machines.
Basically I have a process that generates large reference table(often over 10gig python dict) and then other machines run independent proceses but reference that table. The dict does not change once its created and all the other machines simply refer to it to do their work. I’m leaning towards storing all this in a database and then having all the servers query the server to get that data. I just suspect it might having multiple 10gig+ queries at the same time may not be the best way to do it. I have thought about a flat file or passing it over using a distribution tool.
Is there any other ways to share this python dict among several machines(general approaches are fine but I’m using python so any library suggestion would work also)?
Well, yes, it would make sense to store it in some kind of shared datastore. Depending on your exact needs, you may find it preferable to store the data in some kind of nosql-type storage. For example redis ( http://redis.io ) is pretty reasonable, and supports various datastructures, including hashtables.