I am writing a program that utilizes a thread pool in order to search through files of a specified extension for matches to a regex expression.
My thread pool looks like this:
for( int i = 0; i < _nThreads; ++i )
{
_threads.push_back( thread( &ThreadPool::GrepFunc, this ) );
}
and the running function looks like this:
void ThreadPool::GrepFunc()
{
// implement a barrier
while( !_done )
{
while( !_tasks.empty() )
{
fs::path task;
bool gotTask = false;
{
lock_guard<mutex> tl( _taskMutex );
if( !_tasks.empty() )
{
task = _tasks.front();
_tasks.pop();
gotTask = true;
}
}
if( gotTask )
{
if( std::tr2::sys::is_directory( task ) )
{
for( fs::directory_iterator dirIter( task ), endIter; dirIter != endIter; ++dirIter )
{
if( fs::is_directory( dirIter->path() ) )
{
{ lock_guard<mutex> tl( _taskMutex );
_tasks.push( dirIter->path() ); }
}
else
{
for( auto& e : _args.extensions() )
{
if( !dirIter->path().extension().compare( e ) )
{
SearchFile( dirIter->path() );
}
}
}
}
}
else
{
for( auto& e : _args.extensions() )
{
if( !task.extension().compare( e ) )
{
SearchFile( task );
}
}
}
}
}
}
}
Essentially the program receives an initial directory from the user and will recursively search through it and all sub directories for files matching the extension looking for regex matches. I am having trouble figuring out how to determine the stopping case for when _done has been reached. I need to ensure that all directories and files inside the initial directory have been scanned and also that all items inside of _tasks have been completed before I join the threads back. Any thoughts would really be appreciated.
I’d suggest having one thread (possibly the same thread spawning the file-processing threads) dedicated to doing the recursive filesystem search for matching files; it can add the files into a work queue from which the file-searching threads can pick up work. You can use a condition variable to coordinate this.
Coordinating shutdown is a little tricky, as you’ve found. After the filesystem-search thread has completed its search, it can set some “just finish what’s queued” flag visible to the worker threads then signal them all to wake up and try to process another file: if they find the file/work queue empty they exit. The filesystem-search thread then joins all workers.