Assume I have the following code: int x[200]; void thread1() { for(int i =

Question

0

Asked: May 25, 20262026-05-25T06:28:50+00:00 2026-05-25T06:28:50+00:00

Assume I have the following code: int x[200]; void thread1() { for(int i =

0

Assume I have the following code:

int x[200];

void thread1() {
  for(int i = 0; i < 100; i++)
    x[i*2] = 1;
}

void thread2() {
  for(int i = 0; i < 100; i++)
    x[i*2 + 1] = 1;
}

Is the code correct in x86-64 memory model (from what I understand it is) assuming the page was configured with default write cache policy in Linux? What is the impact on performance of such code (from what I understand – none)?

PS. As of performance – I am mostly interested in Sandy Bridge.

EDIT: As of expectation – I want to write to aligned locations from different threads. I expect the upper code after finishing and barrier to contains {1,1,1, ...} in x rather then {0,1,0,1,...} or {1,0,1,0,...}.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T06:28:51+00:00

If I understand correctly the writes will eventually propagate by snooping requests . The Sandy Bridge uses Quick Path between cores so the snooping would not hit FSB but would use much quicker interconnection. As it is not based on cache-invalidation-on-write it should be ‘fairly’ quick although I wasn’t able to find what is the overhead of conflict resolution (but probably lower then L3 write).

Source

EDIT: According to Intel® 64 and IA-32 Architectures Optimization Reference Manual clean hit have impact of 43 cycles and dirty hit have impact of 60 cycles (compared with 4 cycles normal overhead for L1, 12 for L2 and 26-31 for L3).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Assume I have the following code: int x[200]; void thread1() { for(int i =

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply