Edit: ST does not allow to post more than two links for newbies. Sorry for the missing references.
I’m trying to reduce locking overhead in a C application where detecting changes on a global state is performance relevant. Even though I’ve been reading quite a lot on the topic lately (e.g. a lot from H. Sutter, and many more) I fail to be confident about my implementation. I would like to use a combination of a CAS like operation and DCL for a check on a Cache-Line Aligned global variable, thus avoiding false-sharing, to update thread local data from data shared among multiple threads. My lack of confidence is mainly due to
- me failing to interpret the GNU documentation on Type-Attributes
- I seem not being able to find any literature and examples that I could easily translate to C, such as aligning-to-cache-line-and-knowing-the-cache-line-size on ST or 1 (although 1 seems to answer my question somewhat I’m not confident with my implementation)
- my experience with C is limited
My questions:
-
The Type-Attributes documentation states:
This attribute specifies a minimum alignment (in bytes) for variables of the specified
type. For example, the declarations:(please see Type-Attributes documentation for declaration)
force the compiler to insure (as far as it can) that each variable whose type is
struct S
ormore_aligned_intwill be allocated and aligned at least on a8-byteboundary. On a
SPARC, having all variables of typestruct Saligned to8-byteboundaries allows the
compiler to use the ldd and std (doubleword load and store) instructions when copying one
variable of type struct S to another, thus improving run-time efficiency.Does that mean that the beginning of
struct Sormore_aligned_intwill always be aligned to8-byteboundary? It does not mean the data will be padded to use exactly 64 bytes, right? -
Assuming 1. is true that every instance of
struct cache_line_aligned(see code Example 1 below) aligns on64-byteboundaries and utilize exactly one cache-line (assuming cache-lines are64 bytesin length) -
Using
typedeffor the type declaration does not alter the semantics of__attribute__ ((aligned (64)))(see code Example 2 below) -
I do not need to use
aligned_mallocwhen instantiating the struct if struct is declared with__attribute__ ...
// Example 1
struct cache_line_aligned {
int version;
char padding[60];
} __attribute__ ((aligned (64)));
// Example 2
typedef struct {
int version;
// place '__attribute__ ((aligned (64)))' after 'int version'
// or at the end of the declaration
char padding[60];
} cache_line_aligned2 __attribute__ ((aligned (64)));
And finally a sketch of a function that uses the cache-line aligned approach to efficiently check if global state has been modified by some other thread:
void lazy_update_if_changed(int &t_version, char *t_data) {
// Assuming 'g_cache_line_aligned' is an instance of
// 'struct cache_line_aligned' or 'struct cache_line_aligned2'
// and variables prefixed with 't_' being thread local
if(g_cache_line_aligned.version == t_version) {
// do nothing and return
} else {
// enter critical section (acquire lock e.g. with pthread_mutex_lock)
t_version = g_cache_line_aligned.version
// read other data that requires locking where changes are notified
// by modifying 'g_cache_line_aligned.version', e.g. t_data
// leave critical section
}
}
Sorry for the long post.
Thank you!
When you define an aligned type, say, aligned to 8-byte boundaries, the compiler should make the type a multiple of the alignment (here, a multiple of 8 bytes) in size by padding.
The rationale for that is simple. Suppose you want to define an array of that aligned type. Naturally, every element of it should be aligned as well. That’s why there may be padding.
Here’s a little demonstration:
Output (ideone):
If you create an instance of
struct cache_line_alignednon-dynamically (IOW, not viamalloc()and such), just like in the above code, it will be aligned.The C standard (from 1999) states for
malloc(),calloc()andrealloc():Where
any type of objectdoes not include artificially aligned/padded types like the above struct because there isn’t anything like__attribute__ ((aligned (64)))in the C standard. This is a GNU extension here. For dynamically allocated objects with arbitrary alignment you have to use the appropriate memory allocation function or do the alignment manually (by allocating more memory and then “aligning” the pointer value).