Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7814549
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T05:13:47+00:00 2026-06-02T05:13:47+00:00

Typically, a BLAS subroutine is defined for a certain unique operation. For instance, DAXPY

  • 0

Typically, a BLAS subroutine is defined for a certain unique operation. For instance,

DAXPY is necessarily y <-- ax + y

DSCAL is necessarily x = ax.

What I wish to achieve is:

z = ax+by and y = ax.

How do I “extend” the subroutines of BLAS so that I can do the above?
(These operations do not necessarily follow each other)

I have tried:

  • Declaring a dummy and then DCOPYing the dummy to the desired vector. Like, DCOPY(dummy,x); DSCAL(a,dummy),DCOPY(y,dummy)

  • Creating my own OpenMP implementation

  • Using tricks like, DCOPY(y,a*x) for y=ax

But the problem is, none of these methods seem to give me a conclusive answer for which is the best way of getting around this problem. I know I should “Profile, Profile, Profile” rather than asking but I have tried all of that but everytime I change the vector a little, what was the best method earlier suddenly becomes the worst or vice versa.

Also,

  • My intention is to bring about the best performance possible.
  • I know that optimizing these operations won’t probably give me much performance boost but I’m trying to save every picosecond that I can.
  • FWIW, I am linking to Intel MKL
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T05:13:51+00:00Added an answer on June 2, 2026 at 5:13 am

    First of all, in your explanation of y <- a x, you could remove one excessive copying by using DCOPY(y,x); DSCAL(a,y).

    Second, OpenMP IMHO is not a solution for this kind of problems, because they are “memory bound”. The penalty lies in pipelining memory accesses with computations and vectorization, which uses more bandwidth by using vector memory accesses. Hand-tuned code should be very complex because of (branch-prediction, cache policies, register file configurations, etc.) You need something like Atlas library of R. Clint Whaley which automatically generates optimized operation implementation for a particular platform. AFAIK, there is BLAST standard (2001), maybe you’ll find similar variants of the operations you’ve presented. May be you need to e-mail them to add these operations to their autotuner.

    As a starting point, I would recommend you use the following implementation of z = ax+by.
    In this case z is written anyway, provided x and y are readonly, you could use:
    DCOPY(z,y); DSCAL(b,z); DAXPY(a, x, z);

    You could also read the articles about ATLAS project, which contain the main considerations about the key aspects of code optimization (the presence of madd operation, cache characteristics, register file configuration, instruction latencies, etc.) and try to write something like a codegenerator for your operations to pipeline execution of various operations and perform a search between various variants.

    It’s an interesting topic, I’ve been implementing BLAS on a heterogeneous multicore architectures with explicitly-managed memory hierarchies, like a Cell processor. I wish you a good luck! Hope my answer is useful!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Typically, access to Azure workers is done via endpoints that are defined in the
since DAO layer is typically responsible for accessing data from DB given certain input
Typically I style forms with the unordered list tag e.g. <fieldset> <ul> <li> <label
Typically, my event handling occurs in the UIViewController, so i used the following line
Typically when you run any program, during execution time what are different storages available
Typically if we just use alert(object); it will show as [object Object] . How
Typically, I've seen people use the class literal like this: Class<Foo> cls = Foo.class;
Typically in Django I can find out what queries are being run against the
Typically views appear to output anchors without 'title' attributes. How would you add 'title'
Typically when I subclass from a UI class I will call the superclass initializer

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.