I’m starting to learn about doing data analysis in Python.
In R, you can load data into memory, then save variables into a .rdata file.
I’m trying to create an analysis “project”, so I can load the data, store the scripts, then save the output so I can recall it should I need to.
Is there an equivalent function in Python?
Thanks
What you’re looking for is binary serialization. The most notable functionality for this in Python is
pickle. If you have some standard scientific data structures, you could look at HDF5 instead. JSON works for a lot of objects as well, but it is not binary serialization – it is text-based.If you expand your options, there are a lot of other serialization options, too. Such as Google’s Protocol Buffers (the developer of
Rprotobufis the top-ranked answerer for the r tag on SO), Avro, Thrift, and more.Although there are generic serialization options, such as
pickleand.Rdat, careful consideration of your usage will be helpful in making I/O fast and appropriate to your needs, especially if you need random access, portability, parallel access, tool re-use, etc. For instance, I now tend to avoid.Rdatfor large objects.