I am reaching a bottleneck on my application and having a tough time finding a solution around it. A little background:
- My app pings an API to gather information on hundreds of thousands of items and store them to the datastore
- We need to perform simple aggregations on a mix of dimensions of these items, which we try and compute during the time we store the items
Current implementation:
- We kick off a download of these items manually as needed, which creates tasks on a backend dedicated for downloading these items. Each task will launch more tasks depending on the # of API calls required to paginate through and obtain every item.
- Each task will download, parse, and bulk store the items, while keeping the aggregations we want in memory by use of a dictionary.
- At the end of each tasks execution, we write the dictionary of aggregates to a pull queue.
- Once we detect we are nearing the end of the API calls we kick off an aggregation task to a second backend configuration
- This “aggregation task” pulls from the pull queue (20 at a time), and merges the dictionaries found in each task (futher doing in memory aggregation), before trying to store each aggregate. This task will also launch other tasks to perform aggregations for remaining tasks in the pull queue (hundreds)
- We use the sharded counter approach to help alleviate any contention when storing to the datastore
- Each aggregation task can try and store 500-1500 aggregations, which should all be independent of one another
There are additional checks and such in there to ensure all pull queue tasks are properly processed and all items are downloaded.
The Problem:
We want to download and store all the items and aggregates as fast as possible. I have 20 instances enabled for each backend configuration described (I’ll refer to them as the “aggregator” backend and “downloader” backend). The downloader backend seems to get through the API calls fairly fast. I make heavy use of the NDB library and asynchronous URL Fetches/Datastore calls in order to obtain this. I’ve also enabled threadsafe:true so that no instance will be waiting for RPC calls to finish before starting the next task (all tasks can operated independent of one another and are idempotent).
The aggregator backend is where the big time sink comes to play. Storing 500-1500 of these aggregates asynchronously through transactions takes 40 seconds or more (and I don’t even think all transactions are being properly committed). I keep this backend with threadsafe:false since I use a pull queue expiration deadline of 300 seconds, but if I allow more than one task to execute on a single instance, they may cascade down and push finishing some of the tasks over the 300 second mark, thus allowing for another task to pull the same task a second time and possibly double-counting.
The logs shows BadRequestError: Nested transactions are not supported. with a previous error (in the stack trace) of TransactionFailedError: too much contention on these datastore entities. please try again.. Another error I commonly see is BadRequestError(The referenced transaction has expired or is no longer valid.)
From my understanding, sometimes these errors mean that a transaction can still be committed without further interaction. How do I know if this has been properly committed? Am I doing this in a logical/efficient manner or is there more room for concurrency without risk of messing everything up?
Relevant Code:
class GeneralShardConfig(ndb.Model):
"""Tracks the number of shards for each named counter."""
name = ndb.StringProperty(required=True)
num_shards = ndb.IntegerProperty(default=4)
class GeneralAggregateShard(ndb.Model):
"""Shards for each named counter"""
name = ndb.StringProperty(name='n', required=True)
count = ndb.FloatProperty(name='c', default=0.00) #acts as a total now
@ndb.tasklet
def increment_batch(data_set):
def run_txn(name, value):
@ndb.tasklet
def txn():
to_put = []
dbkey = ndb.Key(GeneralShardConfig, name)
config = yield dbkey.get_async(use_memcache=False)
if not config:
config = GeneralShardConfig(key=dbkey,name=name)
to_put.append(config)
index = random.randint(0, config.num_shards-1)
shard_name = name + str(index)
dbkey = ndb.Key(GeneralAggregateShard, shard_name)
counter = yield dbkey.get_async()
if not counter:
counter = GeneralAggregateShard(key=dbkey, name=name)
counter.count += value
to_put.append(counter)
yield ndb.put_multi_async(to_put)
return ndb.transaction_async(txn, use_memcache=False, xg=True)
res = yield[run_txn(key, value) for key, value in data_set.iteritems() if value != 0.00]
raise ndb.Return(res)
Given the implementation, the only room for “contention” I see is if 2 or more aggregate tasks need to update the same aggregate name, which shouldn’t happen too frequently, and with sharded counters I would expect this overlap to rarely, if ever, occur. I assume the
BadRequestError(The referenced transaction has expired or is no longer valid.) error appears when the event loop is checking the status of all the tasklets and hits a reference to a transaction that is finished. Problem here is it errors out so does that mean all transactions are prematurely cut off or can I assume all transactions went through? I further assume this line res = yield[run_txn(key, value) for key, value in data_set.iteritems() if value != 0.00] needs to be broken into a try/except for each tasklet to detect these errors.
Before I drive myself mad over this, I’d appreciate any guidance/help on how to optimize this process and do so in a reliable way.
EDIT 1:
I modified the aggregator task behavior as follows:
- If more than 1 tasks was leased from the queue, aggregate the tasks in memory, then store the result in another task in the pull-queue, and immediately launch another “aggregator task”
- Else, if 1 task was leased, try to save the results
This has helped reduce the contention errors I’ve been seeing, but its still not very reliable. Most recently, I hit BadRequestError: Nested transactions are not supported. with a stack trace indicating RuntimeError: Deadlock waiting for <Future fbf0db50 created by transaction_async(model.py:3345) for tasklet transaction(context.py:806) suspended generator transaction(context.py:876); pending>
I am under the belief that this modification should optimize the process by allowing all possible overlaps in the aggregation process to be combined and tried all at once in a single instance, versus multiple instances all performing transactions that may collide. I am still having issues saving the results in a reliable manner.
By reducing the datastore I/O (leaving work to the autobatchers and disabling indexing) you can be more certain that the datastore writes complete (less contention) and it should be faster.
The config (renamed counter) gets are outside of the transaction(s) and can run concurrently whilst looping through the transactions.
Methods and a total property were added to Counter to (hopefully) make it easier to modify in future.
Created a new ndb Property for decimal support (assuming that is why you are specifying 0.00 instead of 0.0).
EDIT:
Removed the need for transactions and changed the sharding system for reliability.