I’m writing a threaded application that will process a list of resources and may or may not place a resulting item in a container (std::map) for each resource.
The processing of resources takes place in multiple threads.
The result container will be traversed and each item acted upon by a seperate thread which takes an item and updates a MySQL database (using mysqlcppconn API), then removes the item from the container and continues.
For simplicities sake, here’s the overview of the logic:
queueWorker() - thread
getResourcesList() - seeds the global queue
databaseWorker() - thread
commitProcessedResources() - commits results to a database every n seconds
processResources() - thread x <# of processor cores>
processResource()
queueResultItem()
And the pseudo-implementation to show what I’m doing.
/* not the actual stucts, but just for simplicities sake */
struct queue_item_t {
int id;
string hash;
string text;
};
struct result_item_t {
string hash; // hexadecimal sha1 digest
int state;
}
std::map< string, queue_item_t > queue;
std::map< string, result_item_t > results;
bool processResource (queue_item_t *item)
{
result_item_t result;
if (some_stuff_that_doesnt_apply_to_all_resources)
{
result.hash = item->hash;
result.state = 1;
/* PROBLEM IS HERE */
queueResultItem(result);
}
}
void commitProcessedResources ()
{
pthread_mutex_lock(&resultQueueMutex);
// this can take a while since there
for (std::map< string, result_item_t >::iterator it = results.begin; it != results.end();)
{
// do mysql stuff that takes a while
results.erase(it++);
}
pthread_mutex_unlock(&resultQueueMutex);
}
void queueResultItem (result_item_t result)
{
pthread_mutex_lock(&resultQueueMutex);
results.insert(make_pair(result.hash, result));
pthread_mutex_unlock(&resultQueueMutex);
}
As indicated in processResource(), the problem is there and is that when commitProcessedResources() is running and resultQueueMutex is locked, we’ll wait here for queueResultItem() to return since it’ll try to lock the same mutex and therefore will wait until it’s done, which might take a while.
Since there, obviously, is a limited number of threads running, as soon as all of them are waiting for queueResultItem() to finish, no more work will be done until the mutex is released and usable for queueResultItem().
So, my question is how I best go about implementing this? Is there a specific kind of standard container that can be inserted into and deleted from simultaneously or does there exist something that I just don’t know of?
It is not strictly necessary that each queue item can have it’s own unique key as is the case here with the std::map, but I would prefer it since several resources can produce the same result and I would prefer to only send a unique result to the database even if it does use INSERT IGNORE to ignore any duplicates.
I’m fairly new to C++ so I’ve no idea what to look for on Google, unfortunately. 🙁
You do not have to hold the lock for the queue all the time during processing in
commitProcessedResources (). You can instead swap the queue with empty one: