I am building a class hierarchy that uses SSE intrinsics functions and thus some of the members of the class need to be 16-byte aligned. For stack instances I can use __declspec(align(#)), like so:
typedef __declspec(align(16)) float Vector[4];
class MyClass{
...
private:
Vector v;
};
Now, since __declspec(align(#)) is a compilation directive, the following code may result in an unaligned instance of Vector on the heap:
MyClass *myclass = new MyClass;
This too, I know I can easily solve by overloading the new and delete operators to use _aligned_malloc and _aligned_free accordingly. Like so:
//inside MyClass:
public:
void* operator new (size_t size) throw (std::bad_alloc){
void * p = _aligned_malloc(size, 16);
if (p == 0) throw std::bad_alloc()
return p;
}
void operator delete (void *p){
MyClass* pc = static_cast<MyClass*>(p);
_aligned_free(p);
}
...
So far so good.. but here is my problem. Consider the following code:
class NotMyClass{ //Not my code, which I have little or no influence over
...
MyClass myclass;
...
};
int main(){
...
NotMyClass *nmc = new NotMyClass;
...
}
Since the myclass instance of MyClass is created statically on a dynamic instance of NotMyClass, myclass WILL be 16-byte aligned relatively to the beginning of nmc because of Vector’s __declspec(align(16)) directive. But this is worthless, since nmc is dynamically allocated on the heap with NotMyClass’s new operator, which doesn’t nesessarily ensure (and definitely probably NOT) 16-byte alignment.
So far, I can only think of 2 approaches on how to deal with this problem:
-
Preventing MyClass users from being able to compile the following code:
MyClass myclass;meaning, instances of MyClass can only be created dynamically, using the new operator, thus ensuring that all instances of MyClass are truly dynamically allocatted with MyClass’s overloaded new. I have consulted on another thread on how to accomplish this and got a few great answers:
C++, preventing class instance from being created on the stack (during compiltaion) -
Revert from having Vector members in my Class and only have pointers to Vector as members, which I will allocate and deallocate using
_aligned_mallocand_aligned_freein the ctor and dtor respectively. This methos seems crude and prone to error, since I am not the only programmer writing these Classes (MyClass derives from a Base class and many of these classes use SSE).
However, since both solutions have been frowned upon in my team, I come to you for suggestions of a different solution.
If you’re set against heap allocation, another idea is to over allocate on the stack and manually align (manual alignment is discussed in this SO post). The idea is to allocate byte data (
unsigned char) with a size guaranteed to contain an aligned region of the necessary size (+15), then find the aligned position by rounding down from the most-shifted region (x+15 - (x+15) % 16, orx+15 & ~0x0F). I posted a working example of this approach with vector operations on codepad (forg++ -O2 -msse2). Here are the important bits:The constructor ensures that vPtr is aligned (note the order of members in the class declaration is important).
This approach works (heap/stack allocation of containing classes is irrelevant to alignment), is portabl-ish (I think most compilers provide a pointer sized uint
uintptr_t), and will not leak memory. But its not particularly safe (being sure to keep the aligned pointer valid under copy, etc), wastes (nearly) as much memory as it uses, and some may find the reinterpret_casts distasteful.The risks of aligned operation/unaligned data problems could be mostly eliminated by encapsulating this logic in a Vector object, thereby controlling access to the aligned pointer and ensuring that it gets aligned at construction and stays valid.