I have some code that roughly does this inside of a GAE worker task:
list_of_dicts = xmlrpc_call(...)
objects_to_put = []
for row in list_of_dicts.items():
object = DatastoreModel(**row)
object.x = ...
objects_to_put.append(object)
db.put(objects_to_put)
I’ve also tried this:
list_of_dicts = xmlrpc_call(...)
objects_to_put = []
for row in list_of_dicts.items():
object = DatastoreModel(**row)
object.x = ...
objects_to_put.append(object)
if len(objects_to_put) > 10:
db.put(objects_to_put)
objects_to_put = []
db.put(objects_to_put)
(The idea being to put every 10 objects, to avoid having a huge list)
The problem, invariably, is that this block of code apparently takes up vast sums of memory, even though the list is relatively small (~100 items) and each item in the last contains just a few keys. There are no big blobs, big chunks of string, or anything but relatively small potatoes data structures here.
What’s causing this worker to exceed its memory quota every time it runs and how can I efficiently create a relatively large (~100 or so) number of datastore objects?
I think the second method and add the del keywords would be better.
But it is hard to say it can solve your questions.
There is an AppTrace tool that can trace the memory usage in development server. However, it only runs on development server.
http://code.google.com/p/apptrace/wiki/UsingApptrace