After having successfully build a static data structure (see here), I would want to avoid having to build it from scratch every time a user requests an operation on it. My naïv first idea was to dump the structure (using python’s pickle) into a file and load this file for each query. Needless to say (as I figured out), this turns out to be too time-consuming, as the file is rather large.
Any ideas how I can easily speed up this thing? Splitting the file into multiple files? Or a program running on the server? (How difficult is this to implement?)
Thanks for your help!
My suggestion would be not to rely on having an object structure. Instead have a byte array (or mmap’d file etc) which you can do random access operations on and implement the cross-referencing using pointers inside that structure.
True, it will introduce the concept of pointers to your code, but it will mean that you don’t need to unpickle it each time the handler process starts up, and it will also use a lot less memory (as there won’t be the overhead of python objects).
As your database is going to be fixed during the lifetime of a handler process (I imagine), you won’t need to worry about concurrent modifications or locking etc.
Even if you did what you suggest, you shouldn’t have to rebuild it on every user request, just keep an instance in memory in your worker process(es), which means it won’t take too long to build as you only build it when a new worker process starts.