Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8799801
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T00:25:13+00:00 2026-06-14T00:25:13+00:00

I am teaching myself CUDA with pyCUDA. In this exercise, I want to send

  • 0

I am teaching myself CUDA with pyCUDA. In this exercise, I want to send over a simply array of 1024 floats to the GPU and store it in shared memory. As I specify below in my arguments, I run this kernel on just a single block with 1024 threads.

import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import pycuda.autoinit
import numpy as np
import matplotlib.pyplot as plt

arrayOfFloats = np.float64(np.random.sample(1024))
mod = SourceModule("""
  __global__ void myVeryFirstKernel(float* arrayOfFloats) {
    extern __shared__ float sharedData[];

    // Copy data to shared memory.
    sharedData[threadIdx.x] = arrayOfFloats[threadIdx.x];
  }
""")
func = mod.get_function('myVeryFirstKernel')
func(cuda.InOut(arrayOfFloats), block=(1024, 1, 1), grid=(1, 1))
print str(arrayOfFloats)

Strangely, I am getting this error.

[dfaux@harbinger CUDA_tutorials]$ python sharedMemoryExercise.py 
Traceback (most recent call last):
  File "sharedMemoryExercise.py", line 17, in <module>
    func(cuda.InOut(arrayOfFloats), block=(1024, 1, 1), grid=(1, 1))
  File "/software/linux/x86_64/epd-7.3-1-pycuda/lib/python2.7/site-packages/pycuda-2012.1-py2.7-linux-x86_64.egg/pycuda/driver.py", line 377, in function_call
    Context.synchronize()
pycuda._driver.LaunchError: cuCtxSynchronize failed: launch failed
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: launch failed
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: launch failed

I have tried to debug this error by changing the type of elements I am sending to my GPU (instead of float64, I use float32 for instance). I have also tried altering my block and grid sizes to no avail.

What could be wrong? What is a dead context? Any advice or ideas appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T00:25:14+00:00Added an answer on June 14, 2026 at 12:25 am

    One problem i see with your code is that you use extern __shared__ .. which means that you need to submit the size of the shared memory when you launch the kernel.

    In pycuda this is done by:
    func(cuda.InOut(arrayOfFloats), block=(1024, 1, 1), grid=(1, 1),shared=smem_size)
    where smem_size is the size of the shared memory in bytes.

    In your case smem_size = 1024*sizeof(float).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am teaching myself CUDA from the ground up. I made this simple kernel
I am teaching myself jquery it seems while working on this project. I assumed
I am teaching myself CUDA with the programming guide that CUDA offers. For practice,
I have started teaching myself about the Android NDK and I have followed this
I’m teaching myself php and I’m stuck with representing a data array coming from
I am teaching myself Java and am getting this error when I run code
I'm teaching myself about structures in C, and am having trouble compiling this code:
I am teaching myself C# (I don't know much yet). In this simple example:
I am teaching myself python from this site . On Chapter 3 , when
So I'm teaching myself Python, and I'm having an issue with lists. I want

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.