Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8755795
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T13:55:33+00:00 2026-06-13T13:55:33+00:00

i profiled my matlab code in order to identify most consuming time functions they

  • 0

i profiled my matlab code in order to identify most consuming time functions they are mostly gradient, Kron matlab functions in this filein order to write them into cuda kernels then PTX them and call them from matlab.Any idea or articles will be good.also the calcution of m and b seem to be separable make them good candidate to be assign to different blocks,here is a snap of the code from the file

i2w=g0*aff(i2,a0);
[ix,iy]=grad(i2w);

ix=ix.*region;iy=iy.*region;
ix2=ix.^2;iy2=iy.^2;ixiy=ix.*iy;
it=i1-i2w;

m1=sum(sum(kron(ones(1,limy)',(1-centx:limx-centx).^2).*ix2));
m2=sum(sum(kron((1-centy:limy-centy)',(1-centx:limx-centx)).*ix2));

ps: i recently read about NVMEX or so a little help about this option on such code-previously mentioned- will be appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T13:55:34+00:00Added an answer on June 13, 2026 at 1:55 pm

    This is a question that is too long to answer in a single post, but i’ll give You two hints.

    If you depend on the performance of this code enough to spend some 2 weeks writing and testing the CUDA code, let me tell you about my approach to accelerating Matlab code:

    Hint 1:

    Start by re-writing the function in question in such way, ( in matlab) that it uses only loops, memory access, and basic functions that can be found in CUDA manual, like add, multiply etc.
    for example in the pseudo-matlab-code

        function result_array = MyFunctionToParallelise(constants,source_arrays)
        for x_idx=xcoords
         for y_idx=ycoords
          local_result=inner_function(x_idx,y_idx,constants,source_arrays(x_idx,y_idx));
          store(local_result to result_array(x_idx,y_idx));
         end
        end
    

    If you do that and your “inner_function” is parallelisable (is independent of other local_results, and can be obtained in any order of x_idx,y_idx etc. ) you are at home!

    1. Write your “inner_function” in C (you do know C and MEX, right?) , and make sure it is compilable, returns correct result, and works in mex file using regular loop for inner y_idx and OpenMP-ized loop for outer x_idx loop. If you do that, you will often get an acceleration of 4x! (due to openMP on a 4-core CPU). No need for toolboxes and other paid stuff – you get that in Matlab and MEX by default.

    2. Write a CUDA launcher for “inner_function”. No need for commercial toolboxes. This is the easy part! simply replace the “for loops” with threads and blocks. . . . and insert this into your mex file where you used to have your regular function before. Expect 10x – 100x acceleration over C at this step.

    Following this approach, you will be able to debug and verify correctness at every small step. In my experience, typos in the code that manages buffer pointers and buffer sizes is the main source of crashes and wrong results. No point in obtaing the WRONG result really fast!.

    Hint no.2: For some complex functions (like kron), if your input and output is of fixed size, it might be possible to obtain register-level optimized, linear, non-iterative, non-branching code using computer algebra system like Wolfram Mathematica. Such code executes super-fast on GPU. Example: Example use of Mathematica’s formula optimising compiler

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I profiled my code using gprof and from the report, most, if not all
I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function,
Here is the result of a profiled simulation run of my MATLAB program. I
How to profile c++ code to get the call times and cost time of
I've profiled some legacy code I've inherited with cProfile. There were a bunch of
I profiled my code with both JProfiler and YourKit. However, I haven't been able
I have profiled my application and found out that not my functions are causing
I've profiled my application, and it spends 90% of its time in plus_minus_variations .
I have profiled my code on AppEngine and some of the code says it
I recently profiled some code using JVisualVM, and found that one particular method was

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.