I have here some glsl, And it works like a charm. Only compiling is taking 3 minutes or something. I know this is due to angle, Angle is a piece of software that converts opengl es 2.0 code to directX 9 for webgl on windows systems. if i disable angle, it compiles in a second. Does anybody know’s why nested loops are soo slow in angle. And if there is a work around? I mean i can’t just let everybody wait more than a minute per shader.
for ( int b = 0; b < numberOfSplitpoints; b++ ) {
if ( cameraDepth > splitPoints[b] && cameraDepth < splitPoints[b+1] ) {
const float numberOfSplitpoints = float( NUMBER_OF_SPLIT_POINTS - 1 );
vec4 projCoords = v_projTextureCoords[b];
projCoords /= projCoords.w;
projCoords = 0.5 * projCoords + 0.5;
float shadowDepth = projCoords.z;
projCoords.x /= numberOfSplitpoints;
projCoords.x += float(b) / numberOfSplitpoints;
for( int x = 0; x < fullkernelSize; x++ ) {
for( int y = 0; y < fullkernelSize; y++ ) {
vec2 pointer = vec2( float(x-kernelsize) / 3072.0, float(y-kernelsize) / 1024.0 );
float convolution = kernel[x] * kernel[y];
vec4 color = texture2D(shadowMapSampler, projCoords.xy+pointer);
if(encodeDepth( color ) + shadowBias > shadowDepth) {
light += convolution;
} else {
light += convolution * 0.6;
}
}
}
}
}
vec2 random = normalize(texture2D(randomSampler, screenSize * uv / 64.0).xy * 2.0 - 1.0);
float ambiantAmount = 0.0;
const int kernel = 4;
float offset = ssoasampleRad / depth;
for(int x = 0; x<kernel; x++) {
vec2 a = reflect(directions[x], random) * offset;
vec2 b = vec2( a.x *0.707 - a.y*0.707,
a.x*0.707 + a.y*0.707 );
ambiantAmount += abientOcclusion(uv, a*0.25, position, normal);
ambiantAmount += abientOcclusion(uv, b*0.50, position, normal);
ambiantAmount += abientOcclusion(uv, a*0.75, position, normal);
ambiantAmount += abientOcclusion(uv, b, position, normal);
}
The GLSL ES does not define while loops and “dynamically” bounded for loops to be mandatory.
ANGLE takes advantage of this and does extensive loop unrolling:
If you have
for ( int b = 0; b < numberOfSplitpoints; b++ ), thenumberOfSplitpointshas to be constant expression, otherwise the shader won’t compile.The loop unrolling is supposed to allow native shader optimizer to do more optimizations and minimize divergence, but (in your code) if you have
numberOfSplitpointsandfullkernelSizevery large, the unrolled code can get really long (code in the inner-most part will get repeatednumberOfSplitpoints*fullkernelSize*fullkernelSizetimes), which may cause the optimizer and compiler to go into all sorts of trouble.