Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1077935
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T21:38:59+00:00 2026-05-16T21:38:59+00:00

I have a program that is running slower than I’d like it to. I’ve

  • 0

I have a program that is running slower than I’d like it to.

I’ve done some profiling, and I’ve found the section that is taking up the vast majority of processing time

        DO K = 0, K_MAX
            WRITE(EIGENVALUES_IO, *) K * 0.001 * PI, (W_UP(J), J=1, ATOM_COUNT)
            DCMPLXW_UP(:) = DCMPLX(W_UP(:))
            DO E = 1, ENERGY_STEPS
                ENERGY = MIN_ENERGY + ENERGY_STEP * REAL(E, DP)
                ZV = DCMPLX(ENERGY, DELTA)
                ON_SITE_SINGLE = DCMPLX(0.0_DP)
                DO Q = 1, ATOM_COUNT
                    DO J = 1, ATOM_COUNT
                        ON_SITE_SINGLE(J) = ON_SITE_SINGLE(J) + (MATRIX_UP(J, Q) * MATRIX_UP_CONJG(J, Q)) / (ZV - DCMPLXW_UP(Q))
                    END DO
                END DO
                DOS_DOWN(E) = DOS_DOWN(E) - WEIGHTS(K) * SUM(IMAG(ON_SITE_SINGLE))
            END DO
        END DO

The line

ON_SITE_SINGLE(J) = ON_SITE_SINGLE(J) + (MATRIX_UP(J, Q) * MATRIX_UP_CONJG(J, Q)) / (ZV - DCMPLXW_UP(Q))

Is the one that is doing the damage.

I’m fairly novice at this, is there some way of speeding this up? AFAIK, the same principles apply with C, so any help from you guys too would be nice.

The arrays are all COMPLEX

K_MAX is 1000

ENERGY_STEPS is 1000

ATOM_COUNT is low ( < 50)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T21:38:59+00:00Added an answer on May 16, 2026 at 9:38 pm

    All my programs run slower than I’d like. In all (OK, not all, but many of) my scientific programs there is a deep loop nest in which the innermost statement(s) take up most of the computation time. Typically I expect 90%+ of my computations to be taken up by those statements. That innermost statement of yours is being executed 2.5×10^9 times, so you should expect it to take a significant fraction of the total time.

    Bearing this in mind I suggest that you:

    a) Take @Alexandre’s advice to use BLAS rather than your home-brewed matrix-vector multiplication.

    b) ignore @Yuval’s advice about lifting operations out of the loop – a good Fortran compiler will do this for you if you turn optimisation up high (WARNING: this is a self-fulfilling prophesy in as much as if the compiler doesn’t it is not a good one). There are a lot of other optimisations I expect from a good Fortran these days, see (d). (I don’t expect optimisation of memory access by the compiler, I expect that from BLAS.)

    c) Form a realistic expectation of how much performance you should be able to get from your program. If you get a sustained FLOPs rate in excess of 10% of the CPUs rated performance you are doing very well and should spend your time doing other things rather than optimisation.

    d) Read your compiler documentation very carefully. Make sure that you understand what the optimisation flags actually do. Make sure that you are generating code for the CPUs you are using, not some older variant. Switch in fast vector operations if they are available. All that sort of thing.

    e) Start parallelising. OpenMP is a good place to start and, as @Nicolas indicates, the learning curve is quite gentle at first.

    Oh, and advice 0, which you seem to have followed, is to measure the code’s performance and measure the impact of any changes you make.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.