I am quite new to multi-threading, I have a single threaded data analysis app that has a good bit of potential for parallelization and while the data sets are large it does not come close to saturating the hard-disk read/write so I figure I should take advantage of the threading support that is now in the standard and try to speed the beast up.
After some research I decided that producer consumer was a good approach for the reading of data from the disk and processing it and I started writing an object pool that would become part of the circular buffer that will be where the producers put data and the consumers get the data. As I was writing the class it felt like I was being too fine grained in how I was handling locking and releasing data members. It feels like half the code is locking and unlocking and like there are an insane number of synchronization objects floating around.
So I come to you with a class declaration and a sample function and this question: Is this too fine-grained? Not fine grained enough? Poorly thought out?
struct PoolArray
{
public:
Obj* arr;
uint32 used;
uint32 refs;
std::mutex locker;
};
class SegmentedPool
{
public: /*Construction and destruction cut out*/
void alloc(uint32 cellsNeeded, PoolPtr& ptr);
void dealloc(PoolPtr& ptr);
void clearAll();
private:
void expand();
//stores all the segments of the pool
std::vector< PoolArray<Obj> > pools;
ReadWriteLock poolLock;
//stores pools that are empty
std::queue< int > freePools;
std::mutex freeLock;
int currentPool;
ReadWriteLock currentLock;
};
void SegmentedPool::dealloc(PoolPtr& ptr)
{
//find and access the segment
poolLock.lockForRead();
PoolArray* temp = &(pools[ptr.getSeg()]);
poolLock.unlockForRead();
//reduce the count of references in the segment
temp->locker.lock();
--(temp->refs);
//if the number of references is now zero then set the segment back to unused
//and push it onto the queue of empty segments so that it can be reused
if(temp->refs==0)
{
temp->used=0;
freeLock.lock();
freePools.push(ptr.getSeg());
freeLock.unlock();
}
temp->locker.unlock();
ptr.set(NULL,-1);
}
A few explanations:
First PoolPtr is a stupid little pointer like object that stores the pointer and the segment number in the pool that the pointer came from.
Second this is all “templatized” but i took those lines out to try to reduce the length of the code block
Third ReadWriteLock is something I put together using a mutex and a pair of condition variables.
Locks are inefficient no matter how fine grained they are, so avoid at all cost.
Both queue and vector can be easyly implemented lock free using
compare-swapprimitive.there are a number of papers on the topic
Lock free queue:
Lock free vector:
Straustrup’s paper also refers to lock-free allocator, but don’t jump at it right away, standard allocators are pretty good these days.
UPD
If you don’t want to bother writing your own containers, use Intel’s Threading Building Blocks library, it provide both thread safe vector and queue. They are NOT lock free, but they are optimized to use CPU cache efficiently.
UPD
Regarding
PoolArray, you don’t need a lock there either. If you can use c++11, usestd::atomicfor atomic increments and swaps, otherwise use compiler built-ins (InterLocked* functions in MSVC and _sync* in gcc http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html)