I’m writing a small slab allocator for my program, however instead of using lists with locking mechanism, After reading a paper on lockless heaps by IBM, i decided to implement something like that. However which GCC intrinsic (ffs/ffz/ctz etc..) would be most efficient and why ?
My most likely target will be ARMv7 and ARMv6 processors with CLZ hardware instruction.
I’ve comeup with something like
uint32_t tmp;
uint32_t new_bits;
uint32_t old_bits;
do {
old_bits = slab->bitmap;
tmp = <function>(old_bitmap);
new_bits = old_bits | (1 << tmp);
} while(cpu_atomic_cmpxchg(&slab->bitmap, old_bits, new_bits) != OS_OKAY);
return ((void *) slab->start + (tmp * slab->blksize))
If you have CTZ then just invert the value and count trailing zeroes:
E.g.