What would be the best way to handle lightweight crash recovery for my program?
I have a Python program that runs a number of test cases and the results are stored in a dictionary which serves as a cache. If I could save (and then restore) each item that is added to the dictionary, I could simply run the program again and the caching would provide suitable crash recovery.
- You may assume that the keys and values in the dictionary are easily convertible to strings ie. using either str or the pickle module.
- I want this to be completely cross platform – well at least as cross platform as Python is
- I don’t want to simply write out each value to a file and load it in my program might crash while I am writing the file
- UPDATE: This is intended to be a lightweight module so a DBMS is out of the question.
- UPDATE: Alex is correct in that I don’t actually need to protect against crashes while writing out, but there are circumstances where I would like to be able to manually terminate it in a recoverable state.
- UPDATE Added a highly limited solution using standard input below
There’s no good way to guard against “your program crashing while writing a checkpoint to a file”, but why should you worry so much about that?! What ELSE is your program doing at that time BESIDES “saving checkpoint to a file”, that could easily cause it to crash?!
It’s hard to beat
pickle(orcPickle) for portability of serialization in Python, but, that’s just about “turning your keys and values to strings”. For saving key-value pairs (once stringified), few approaches are safer than just appending to a file (don’t pickle to files if your crashes are far, far more frequent than normal, as you suggest tjey are).If your environment is incredibly crash-prone for whatever reason (very cheap HW?-), just make sure you close the file (and fflush if the OS is also crash-prone;-), then reopen it for append. This way, worst that can happen is that the very latest append will be incomplete (due to a crash in the middle of things) — then you just catch the exception raised by unpickling that incomplete record and redo only the things that weren’t saved (because they weren’t completed due to a crash, OR because they were completed but not fully saved due to a crash, comes to much the same thing in the end).
If you have the option of checkpointing to a database engine (instead of just doing so to files), consider it seriously! The DB engine will keep transaction logs and ensure ACID properties, making your application-side programming much easier IF you can count on that!-)