in my app i for one of the handler i need to get a bunch of entities and execute a function for each one of them.
i have the keys of all the enities i need. after fetching them i need to execute 1 or 2 instance methods for each one of them and this slows my app down quite a bit. doing this for 100 entities takes around 10 seconds which is way to slow.
im trying to find a way to get the entities and execute those functions in parallel to save time but im not really sure which way is the best.
i tried the _post_get_hook but the i have a future object and need to call get_result() and execute the function in the hook which works kind of ok in the sdk but gets a lot of ‘maximum recursion depth exceeded while calling a Python objec’ but i can’t really undestand why and the error message is not really elaborate.
is the Pipeline api or ndb.Tasklets what im searching for?
atm im going by trial and error but i would be happy if someone could lead me to the right direction.
EDIT
my code is something similar to a filesystem, every folder contains other folders and files. The path of the Collections set on another entity so to serialize a collection entity i need to get the referenced entity and get the path. On a Collection the serialized_assets() function is slower the more entities it contains. If i could execute a serialize function for each contained asset side by side it would speed things up quite a bit.
class Index(ndb.Model):
path = ndb.StringProperty()
class Folder(ndb.Model):
label = ndb.StringProperty()
index = ndb.KeyProperty()
# contents is a list of keys of contaied Folders and Files
contents = ndb.StringProperty(repeated=True)
def serialized_assets(self):
assets = ndb.get_multi(self.contents)
serialized_assets = []
for a in assets:
kind = a._get_kind()
assetdict = a.to_dict()
if kind == 'Collection':
assetdict['path'] = asset.path
# other operations ...
elif kind == 'File':
assetdict['another_prop'] = asset.another_property
# ...
serialized_assets.append(assetdict)
return serialized_assets
@property
def path(self):
return self.index.get().path
class File(ndb.Model):
filename = ndb.StringProperty()
# other properties....
@property
def another_property(self):
# compute something here
return computed_property
EDIT2:
@ndb.tasklet
def serialized_assets(self, keys=None):
assets = yield ndb.get_multi_async(keys)
raise ndb.Return([asset.serialized for asset in assets])
is this tasklet code ok?
Since most of the execution time of your functions are spent waiting for RPCs, NDB’s async and tasklet support is your best bet. That’s described in some detail here. The simplest usage for your requirements is probably to use the
ndb.mapfunction, like this (from the docs):The callback function is called for each entity returned by the query, and it can do whatever operations it needs (using
_asyncmethods andyieldto do them asynchronously), returning the result when it’s done. Because the callback is a tasklet, and uses yield to make the asynchronous calls, NDB can run multiple instances of it in parallel, and even batch up some operations.