A fragment shader uses two atomic counters. It may or may not increment the first and may or may not increment the second (but never both). Before so modifying the counters, however, their current values are always read and –if the counters are then later modified– those previously read values used for some custom logic. All this happens in a (most likely unrollable) loop.
Envision a flow roughly like this:
- in some small unrollable loop, say FOR 0-20 (compile-time resolvable const)…
- get counter values for AC1 and AC2
- check some value:
- if x: set texel in uimage1D_A at index AC1, increment AC1
- else: set texel in uimage1D_B at index (imgwidth-AC2-1), increment AC2
Question: the shader queries the current counter value — does it always get the “most current” value? Do I lose the massive parallelism of fragment shaders here (speaking in terms of of current-generation and future GPUs and drivers only)?
As for the branching (if x) — I compare a texel in another (readonly restrict uniform) uimage1D to a (uniform) uint. So one operand is definitely a uniform scalar, but the other is an imageLoad().x although the image is uniform — is this sort of branching still “fully parallelized”? You can see both branches are each exactly two, almost identical instructions. Assuming a “perfectly optimizing” GLSL compiler, is this kind of branching likely introducing a stall?
Atomic counters are atomic. But each atomic operation is atomic only for that operation.
So, if you want to ensure that every shader gets a unique value from a counter, then every shader must access that counter only with
atomicCounterIncrement(orDecrement, but they must all use the same one).The correct way to do what you’re suggesting is:
atomicCounterIncrement(AC1), storing the value returned.atomicCounterIncrement(AC2), storing the value returned.Your “fetch and later increment” strategy is a race condition waiting to happen. It doesn’t matter if it’s “fully parallelized” because it’s broken. You need it to work before wondering if it’s going to be fast.
I would strongly advise getting familiar with atomics and threading on CPUs before trying to tackle GPU stuff. This is a common mistake made by novices when working with atomics. You need to be a threading expert (or at least intermediate-level) if you want to use successfully GLSL atomics and image load/store.