Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9031497
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T07:44:47+00:00 2026-06-16T07:44:47+00:00

I am trying to calculate a matrix using C++ AMP. I use an array

  • 0

I am trying to calculate a matrix using C++ AMP. I use an array with width and height of 3000 x 3000 and I repeat the calculating procedure 20000 times:

    //_height=_width=3000
    extent<2> ext(_height,_width);
    array<int, 2> GPU_main(ext,gpuDevice.default_view);
    array<int, 2> GPU_res(ext,gpuDevice.default_view);
    copy(_main, GPU_main);
    array_view<int,2> main(GPU_main);
    array_view<int,2> res(GPU_res);
    res.discard_data();
    number=20000;
    for(int i=0;i<number;i++)
    {
        parallel_for_each(e,[=](index<2> idx)restrict(amp)
        {
           res(idx)=main(idx)+idx[0];//not depend from calculation type
        }
    array_view<TYPE, 2>  temp=res;
    res=main;
    main=temp;
    }
    copy(main, _main);

Before the calculation I copy my matrix from host memory to GPU memory and create an array_view, code line from 0 to 7.

After that I start a loop for calculating some operation and repeat it 20000 times. Every iteration I start a parallel_for_each loop where calculate using C++ AMP.

The GPU calculates very fast but when I copy the result to host array _main I found that this operation takes a lot of time, and also I found that if I decrease number from 20000 to 2000, the time for copy also decreases.

Why does this happen, it is some synchronization issue?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T07:44:48+00:00Added an answer on June 16, 2026 at 7:44 am

    Your code (as is) doesn’t compile, below is a fixed version which I think has the same intent If you want to break out the time for copying from the compute time then the simplest thing to do is to use array<> and explicit copies.

            int _height, _width;
            _height = _width = 3000;
            std::vector<int> _main(_height * _width); // host data.
            concurrency::extent<2> ext(_height, _width);
            // Start timing data copy
            concurrency::array<int, 2> GPU_main(ext /* default accelerator */);
            concurrency::array<int, 2> GPU_res(ext);
            concurrency::array<int, 2> GPU_temp(ext);
            concurrency::copy(begin(_main), end(_main), GPU_main);
            // Finish timing data copy
            int number = 20000;
            // Start timing compute
            for(int i=0; i < number; ++i)
            {
                concurrency::parallel_for_each(ext,
                    [=, &GPU_res, &GPU_main](index<2> idx)restrict(amp)
                {
                   GPU_res(idx) = GPU_main(idx) + idx[0];
                });
                concurrency::copy(GPU_res, GPU_temp);       // Swap arrays on GPU
                concurrency::copy(GPU_main, GPU_res);
                concurrency::copy(GPU_temp, GPU_main);
            }
            GPU_main.accelerator_view.wait(); // Wait for compute
            // Finish timing compute
            // Start timing data copy
            concurrency::copy(GPU_main, begin(_main));
            // Finish timing data copy
    

    Note the wait() call to force the compute to finish. Remember that C++AMP commands usually queue work on the GPU and it is only guarenteed to have executed if you explicitly wait, with wait(), or for it or implicitly wait by calling (for example) synchronize() on an array_view<>. To get a good idea of timing you should really time the compute and data copies separately (as shown above). You can find some basic timing code here: http://ampbook.codeplex.com/SourceControl/changeset/view/100791#1983676 in Timer.h There are examples of it’s use in the same folder.

    However. I’m not sure I would really write the code this way unless I wanted to break out the copy and compute times. It is far simpler to use array<> for data that lives purely on the GPU and array_view<> for data that is copied to and from the GPU.

    This would look like the code below.

            int _height, _width;
            _height = _width = 3000;
            std::vector<int> _main(_height * _width); // host data.
            concurrency::extent<2> ext(_height, _width);
            concurrency::array_view<int, 2> _main_av(_main.size(), _main); 
            concurrency::array<int, 2> GPU_res(ext);
            concurrency::array<int, 2> GPU_temp(ext);
            concurrency::copy(begin(_main), end(_main), _main_av);
            int number = 20000;
            // Start timing compute and possibly copy
            for(int i=0; i < number; ++i)
            {
                concurrency::parallel_for_each(ext,
                    [=, &GPU_res, &_main_av](index<2> idx)restrict(amp)
                {
                   GPU_res(idx) = _main_av(idx) + idx[0];
                });
                concurrency::copy(GPU_res, GPU_temp);  // Swap arrays on GPU
                concurrency::copy(_main_av, GPU_res);
                concurrency::copy(GPU_temp, _main_av);
            }
            _main_av.synchronize();  // Will wait for all work to finish
            // Finish timing compute & copy
    

    Now the data that is only required on the GPU is declared to be on the GPU and the data that needs to be synchronized is declared as such. Clearer and less code.

    You can find out more about this by reading my book on C++ AMP 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to calculate elapsed hours between two times. When calculating PM to AM
I'm trying to calculate a lookat matrix myself, instead of using gluLookAt(). My problem
I have a 2d array (matrix) where I am trying to calculate the biggest
am trying to calculate mean and variance using 3X3 window over image(hXw) in opencv...here
I'm trying to calculate the inverse matrix in Java. I'm following the adjoint method
I'm trying to use the OpenCV 2.3 Python wrapper to calculate the DCT for
I am trying to calculate the transpose of a matrix that is not n
I am trying to use spectral clustering on an image. I first calculate the
I'm trying to implement Ford-Fulkerson/Edmonds-Karp only using adjacency matrix'. The only thing I'm not
I'm trying to write a matrix-vector multiplication program using MPI. I'm trying to send

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.