Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5954601
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T17:59:39+00:00 2026-05-22T17:59:39+00:00

I learned that std::vector is a nice wrapper around raw arrays in C++ so

  • 0

I learned that std::vector is a nice wrapper around raw arrays in C++ so I started to use it for managing host data in my CUDA app [1]. Since having to allocate and copying things by hand makes the code more complex and less readable I thought about extending std::vector. Since I’m not very experienced I would like to know what you think about it. Especially weather it’s correctly done (eg destructor of std::vector is called implicitly, right?) and if you consider it a good idea.

I wrote a small example illustrating this

#include <vector>
#include <cuda.h>

#include <cstdio>

void checkCUDAError(const char *msg)
{
    cudaError_t err = cudaGetLastError();
    if( cudaSuccess != err) {
        fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }
}

// Wrapper around CUDA memory
template<class T>
class UniversalVector: public std::vector<T>
{
    T* devicePtr_;
    bool allocated;

public:

    // Constructor
    UniversalVector(unsigned int length)
        :std::vector<T>(length), 
         allocated(false)
    {}

    // Destructor
    ~UniversalVector()
     {
        if(allocated)
            cudaFree(devicePtr_);
     }

    cudaError_t allocateDevice()
    {
        if(allocated) free(devicePtr_);
        cudaError_t err = 
            cudaMalloc((void**)&devicePtr_, sizeof(T) * this->size());
        allocated = true;
        return err;
    }

    cudaError_t loadToDevice()
    {
        return cudaMemcpy(devicePtr_, &(*this)[0], sizeof(T) * this->size(),
            cudaMemcpyHostToDevice);
    }

    cudaError_t loadFromDevice()
    {
        return cudaMemcpy(&(*this)[0], devicePtr_, sizeof(T) * this->size(),
            cudaMemcpyDeviceToHost);
    }

    // Accessors

    inline T* devicePtr() {
        return devicePtr_;
    }

};

__global__ void kernel(int* a)
{
    int i = threadIdx.x;
    printf("%i\n", a[i]);
}

int main()
{
    UniversalVector<int> vec(3);
    vec.at(0) = 1;
    vec.at(1) = 2;
    vec.at(2) = 3;

    vec.allocateDevice();
    vec.loadToDevice();

    kernel<<<1, 3>>>(vec.devicePtr());

    checkCUDAError("Error when doing something");

    return 0;
}

[1] In CUDA it’s distinguished between host and device memory where host memory is the memory accessible by the GPU and device memory the memory on the GPU The programmer has to move memory from the host to the GPU and back.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T17:59:39+00:00Added an answer on May 22, 2026 at 5:59 pm

    The biggest problem I see with this is that is doesn’t really help manage the GPU side of things very much, and it obfuscates a number of very important pieces of information in the process.

    While the container class contains information about whether the device pointer has been allocated, there is no way of knowing whether the contents of the host container has been copied to the GPU memory it holds, or whether the GPU memory has been copied back to the device. As a result you will have to call the loadToDevice() and loadFromDevice() methods every time you wish to use the container in either host or device code. That probably means unnecessary PCI-e memory transfers at least some of the time. And because you have chosen to wrap only the synchronous CUDA memory copy routines, there will be host blocking every time you do this.

    Ultimately I don’t see much net gain in this idea over a well designed set of helper routines which abstract away the ugliest bits of the CUDA APIs and operate on standard STL types.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

A few hours ago I asked this question . I learned that std::vector deletes
I learned that by trying to use the tablesorter plug in from jquery the
I learned that when executing commands in Python, I should use subprocess. What I'm
So I have learned that that the Microsoft.Jet.OLEDB.4.0 data provider for querying data sources
Most C++ users that learned C prefer to use the printf / scanf family
I learned that Python class attributes are like static data members in C++. However,
I learned that copy something to kill buffer, I can use the kill-new buffer
In C++ I have learned that Variables are the used for Data Storage and
EDIT: Learned that Webmethods actually uses NLST, not LIST, if that matters Our business
I learned that compiler will expand macros while compiling. Templates are also expanded at

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.