I have a few related questions about managing aligned memory blocks. Cross-platform answers would be ideal. However, as I’m pretty sure a cross-platform solution does not exist, I’m mainly interested in Windows and Linux and to a (much) lesser extent Mac OS and FreeBSD.
-
What’s the best way of getting a chunk of memory aligned on 16-byte boundaries? (I’m aware of the trivial method of using
malloc(), allocating a little extra space and then bumping the pointer up to a properly aligned value. I’m hoping for something a little less kludge-y, though. Also, see below for additional issues.) -
If I use plain old
malloc(), allocate extra space, and then move the pointer up to where it would be correctly aligned, is it necessary to keep the pointer to the beginning of the block around for freeing? (Callingfree()on pointers to the middle of the block seems to work in practice on Windows, but I’m wondering what the standard says and, even if the standard says you can’t, whether it works in practice on all major OS’s. I don’t care about obscure DS9K-like OS’s.) -
This is the hard/interesting part. What’s the best way to reallocate a memory block while preserving alignment? Ideally this would be something more intelligent than calling
malloc(), copying, and then callingfree()on the old block. I’d like to do it in place where possible.
If your implementation has a standard data type that needs 16-byte alignment (
long longfor example),mallocalready guarantees that your returned blocks will be aligned correctly. Section 7.20.3 of C99 statesThe pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object.You have to pass back the exact same address into
freeas you were given bymalloc. No exceptions. So yes, you need to keep the original copy.See (1) above if you already have a 16-byte-alignment-required type.
Beyond that, you may well find that your
mallocimplementation gives you 16-byte-aligned addresses anyway for efficiency although it’s not guaranteed by the standard. If you require it, you can always implement your own allocator.Myself, I’d implement a
malloc16layer on top ofmallocthat would use the following structure:Then have your
malloc16()function callmallocto get a block 16 bytes larger than requested, figure out where the aligned area should be, put the padding length just before that and return the address of the aligned area.For
free16, you would simply look at the byte before the address given to get the padding length, work out the actual address of the malloc’ed block from that, and pass that tofree.This is untested but should be a good start:
The magic line in the
malloc16isp = (porig + 16) & (~0xf);which adds 16 to the address then sets the lower 4 bits to 0, in effect bringing it back to the next lowest alignment point (the+16guarantees it is past the actual start of the maloc’ed block).Now, I don’t claim that the code above is anything but kludgey. You would have to test it in the platforms of interest to see if it’s workable. Its main advantage is that it abstracts away the ugly bit so that you never have to worry about it.