I have a 100Mb file with roughly 10million lines that I need to parse into a dictionary every time I run my code. This process is incredibly slow, and I am hunting for ways to speed it up. One thought that came to mind is to parse the file once and then use pickle to save it to disk. I’m not sure this would result in a speed up.
Any suggestions appreciated.
EDIT:
After doing some testing, I am worried that the slow down happens when I create the dictionary. Pickling does seem significantly faster, though I wouldn’t mind doing better.
Lalit
MessagePack has in my experience been much faster for dumping/loading data in python then cPickle, even when using the highest protocol.
However, if you have a dictionary with 10 million entries in it, you might want to check that you’re not hitting the upper limit of your computer’s memory. The process will happen much slower if you run out of memory and have to use swap.