I’m working on an application that does processing at what I’d call fairly high throughput (current peaks in the range of 400 Mbps, design goal of eventual 10 Gbps).
I run multiple instances of a loop which basically just cycles through reading and processing information, and uses a dictionary for holding state. However, i also need to scan the entire dictionary periodically to check for timeouts, and I’d like to solicit some ideas on what to do if this scan becomes a performance hotspot. Basically, what I’m looking for, is if there are any standard techniques for interleaving the timeout checks on the dictionary, with the main processing code in the loop, so that say on loop 1 I check the first dictionary item, loop 2, the second, etc. Also, the dictionary keys change, and will be deleted and added in the main processing code, so it’s not quite as simple as taking a copy of all the dictionary keys and then checking them one by one in the main loop.
I’ll reiterate, this is not a current performance problem. Thus, Please no comments about premature optimizations, I realize it’s premature, I am consciously making the choice to consider this a potential problem.
Edit for clarity: This is a curiosity for me that I’m thinking about it on my weekend, and what a best practices approach might be for something like this. This isn’t the only problem I have, and not the only area of performance I’m looking at. However, this is one area where I’m not really aware of a clean concise way to approach this.
I’m already exploiting parallelism and hardware on this (the next level of hardware is a 5x increase in cost, but more significantly will require a redesign in the parallelism). The parallelism is also working the way I want it to, so again, please it isn’t necessary to comment on this. The dictionary is instantiated per thread, so any additional threads for running the checks would require synchronization between the threads, which is too costly.
Some pseudo code of the logic if it helps:
Dictionary hashdb;
while(true) {
grab_record_from_buffer(); // There is a buffer in place, so some delays are tolerable
process(record); //do the main processing
update_hashdb(); //Add,remove,update entries in the dictionary
if(last_scan > 15 seconds)
foreach(entry in hashdb)
periodic_check(entry); //check for timeouts or any other periodic checks on every db entry
}
I do realize I may not run into an actual problem with the way I have it, so there’s a good chance whatever comes up may not require use. However, what I’m really looking for is if there is any standard approach or algorithm for interleaving a dictionary scan with main processing logic, that I’m just not aware of (and the dictionary is changing). Or any suggestions on an approach to this (I do already have an idea how I would approach it, but it’s not as clean as I’d like).
Thank You,
Are you able to use .NET 4.0 (or at least plan to do so)? If so,
ConcurrentDictionarymay help you – it allows you to iterate over a dictionary while still modifying it (either in the same thread or a different one).You need to be aware that the results may be surprising – you may see some changes but not others, for example – but if that’s acceptable, it may be a useful approach.
You could then have one thread doing periodic checks for all the other dictionaries. I know you’d previously ruled this out due to synchronization requirements, but the beauty of
ConcurrentDictionaryis that it doesn’t require synchronization1. Does that change the feasibility of using a separate checking thread?If you don’t want to use a separate thread you could use an iterator explicitly – each time you go through the loop, check another entry and start again if you’ve reached the end. Again, this wouldn’t work with a standard dictionary, but should work for a
ConcurrentDictionary– so long as you’re willing to work with the possibility of seeing a mixture of updated and stale data.1 … by which I mean it doesn’t require any explicit synchronization, and that the internal synchronization is significantly lighter-weight than having to take out a lock around every access.
From Stephen Toub’s post on
ConcurrentDictionary:The other big reduction in locking is the ability mentioned above: you can iterate over the dictionary in one thread while modifying it in another, so long as you can cope with seeing some changes applied since the iterator was created but not others. Compare this with normal
Dictionary<,>where for safe concurrent access you’d have to lock the dictionary for the entire time you were iterating over it.