I have a very large nested for loop in which some multiplications and additions are performed on floating point numbers.
for (int i = 0; i < length1; i++)
{
double aa = 0;
for(int h = 0; h < 10; h++)
{
aa += omega[i][outsideGeneratedAddress[h]];
}
double alphaOld = alpha;
alpha = Math.Sqrt(alpha * alpha + aa * aa);
s = -aa / alpha;
c = alphaOld / alpha;
for(int j = 0; j <= i; j++)
{
double oldU = u[j];
u[j] = c * oldU + s * omega[i][j];
omega[i][j] = c * omega[i][j] - s * oldU;
}
}
This loop is taking up the majority of my processing time and is a bottleneck.
Would I be likely to see any speed improvements if I rewrite this loop in C and interface to it from C#?
EDIT: I updated the code to show how s and c are generated. Also the inner loop actually goes from 0 to i, though it probably doesn’t make much difference to the question
EDIT2: I implemented the algorithm in VC++ and linked it with C# through a dll and saw a 28% speed boost over C# when all optimisations are enabled. The argument to enable SSE2 works particularly well. Compiling with MinGW and gcc4.4 only gave a 15% speed boost. Just tried the Intel compiler and saw a 49% speed boost for this code.
While most other answers tend to suggest that you look into C# solutions, most miss a point: C code for this method will be faster, provided that you use a good optimizing compiler (I’d suggest Intel, works great for this kind of code).
The compiler will also save a bit of work from the JIT and will yield a much better compiled output (even MSVC compiler can generate SSE2 instructions). Array bounds won’t be checked by default, there will probably be some loop unrolling and – all in all – you’re likely to see a significant performance boost.
As it has been properly pointed out, calling into native code may have a bit of overhead; this should, however, be insignificant compared to the speedup if length1 is big enough.
You may sure keep this code in C# but please remember that compared to several C compilers the CLR (like all other VMs I know) does little to optimize the generated code.