I have a program that is running slower than I’d like it to. I’ve

Question

0

Editorial Team

Asked: May 16, 20262026-05-16T21:38:59+00:00 2026-05-16T21:38:59+00:00

I have a program that is running slower than I’d like it to. I’ve

0

I have a program that is running slower than I’d like it to.

I’ve done some profiling, and I’ve found the section that is taking up the vast majority of processing time

        DO K = 0, K_MAX
            WRITE(EIGENVALUES_IO, *) K * 0.001 * PI, (W_UP(J), J=1, ATOM_COUNT)
            DCMPLXW_UP(:) = DCMPLX(W_UP(:))
            DO E = 1, ENERGY_STEPS
                ENERGY = MIN_ENERGY + ENERGY_STEP * REAL(E, DP)
                ZV = DCMPLX(ENERGY, DELTA)
                ON_SITE_SINGLE = DCMPLX(0.0_DP)
                DO Q = 1, ATOM_COUNT
                    DO J = 1, ATOM_COUNT
                        ON_SITE_SINGLE(J) = ON_SITE_SINGLE(J) + (MATRIX_UP(J, Q) * MATRIX_UP_CONJG(J, Q)) / (ZV - DCMPLXW_UP(Q))
                    END DO
                END DO
                DOS_DOWN(E) = DOS_DOWN(E) - WEIGHTS(K) * SUM(IMAG(ON_SITE_SINGLE))
            END DO
        END DO

The line

ON_SITE_SINGLE(J) = ON_SITE_SINGLE(J) + (MATRIX_UP(J, Q) * MATRIX_UP_CONJG(J, Q)) / (ZV - DCMPLXW_UP(Q))

Is the one that is doing the damage.

I’m fairly novice at this, is there some way of speeding this up? AFAIK, the same principles apply with C, so any help from you guys too would be nice.

The arrays are all COMPLEX

K_MAX is 1000

ENERGY_STEPS is 1000

ATOM_COUNT is low ( < 50)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T21:38:59+00:00

All my programs run slower than I’d like. In all (OK, not all, but many of) my scientific programs there is a deep loop nest in which the innermost statement(s) take up most of the computation time. Typically I expect 90%+ of my computations to be taken up by those statements. That innermost statement of yours is being executed 2.5×10^9 times, so you should expect it to take a significant fraction of the total time.

Bearing this in mind I suggest that you:

a) Take @Alexandre’s advice to use BLAS rather than your home-brewed matrix-vector multiplication.

b) ignore @Yuval’s advice about lifting operations out of the loop – a good Fortran compiler will do this for you if you turn optimisation up high (WARNING: this is a self-fulfilling prophesy in as much as if the compiler doesn’t it is not a good one). There are a lot of other optimisations I expect from a good Fortran these days, see (d). (I don’t expect optimisation of memory access by the compiler, I expect that from BLAS.)

c) Form a realistic expectation of how much performance you should be able to get from your program. If you get a sustained FLOPs rate in excess of 10% of the CPUs rated performance you are doing very well and should spend your time doing other things rather than optimisation.

d) Read your compiler documentation very carefully. Make sure that you understand what the optimisation flags actually do. Make sure that you are generating code for the CPUs you are using, not some older variant. Switch in fast vector operations if they are available. All that sort of thing.

e) Start parallelising. OpenMP is a good place to start and, as @Nicolas indicates, the learning curve is quite gentle at first.

Oh, and advice 0, which you seem to have followed, is to measure the code’s performance and measure the impact of any changes you make.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a program that is running slower than I’d like it to. I’ve

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply