Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8560745
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T16:20:50+00:00 2026-06-11T16:20:50+00:00

I ran SimpleSpeedTest.py from the PyCuda examples, producing the following output: Using nbr_values ==

  • 0

I ran SimpleSpeedTest.py from the PyCuda examples, producing the following output:

Using nbr_values == 8192
Calculating 100000 iterations
SourceModule time and first three results:
0.058294s, [ 0.005477  0.005477  0.005477]
Elementwise time and first three results:
0.102527s, [ 0.005477  0.005477  0.005477]
Elementwise Python looping time and first three results:
2.398071s, [ 0.005477  0.005477  0.005477]
GPUArray time and first three results:
8.207257s, [ 0.005477  0.005477  0.005477]
CPU time measured using :
0.000002s, [ 0.005477  0.005477  0.005477]

The first four time measurements are reasonable, the last one (0.000002s) however is way off. The CPU result should be the slowest one but it is orders of magnitude faster than the fastest GPU method. So obviously the measured time must be wrong. This is strange since the same timing method seems to work fine for the first four results.

So I took some code from SimpleSpeedTest.py and made a small test file [2], which produced:

time measured using option 1:
0.000002s 
time measured using option 2:
5.989620s 

Option 1 measures the duration using pycuda.driver.Event.record() (as in SimpleSpeedTest.py), option 2 uses time.clock(). Again, option 1 is off while option 2 gives a reasonable result (the time it takes to run the test file is around 6s).

Does anyone have an idea as to why this is happening?

Since using option 1 is endorsed in SimpleSpeedTest.py, could it be my setup that is causing the problem? I am running a GTX 470, Display Driver 301.42, CUDA 4.2, Python 2.7 64, PyCuda 2012.1, X5650 Xeon

[2] Test file:

import numpy
import time
import pycuda.driver as drv
import pycuda.autoinit

n_iter = 100000
nbr_values = 8192 # = 64 * 128 (values as used in SimpleSpeedTest.py)

start = drv.Event() # option 1 uses pycuda.driver.Event
end = drv.Event()

a = numpy.ones(nbr_values).astype(numpy.float32) # test data

start.record() # start option 1 (inserting recording points into GPU stream)
tic = time.clock() # start option 2 (using CPU time)

for i in range(n_iter):
    a = numpy.sin(a) # do some work

end.record() # end option 1
toc = time.clock() # end option 2

end.synchronize() 

events_secs = start.time_till(end)*1e-3
time_secs = toc - tic 

print "time measured using option 1:"
print "%fs " % events_secs
print "time measured using option 2:"
print "%fs " % time_secs
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T16:20:51+00:00Added an answer on June 11, 2026 at 4:20 pm

    I contacted Andreas Klöckner and he suggested to synchronize on the start event, too.

    ...
    start.record()
    start.synchronize()
    ...
    

    And this seems to solve the issue!

    time measured using option 1:
    5.944461s
    time measured using option 2:
    5.944314s 
    

    Apparently CUDA’s behaviour changed in the last two years. I updated SimpleSpeedTest.py.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I ran across the following code in Ely Greenfield's SuperImage from his Book component
I ran the following from a base folder ./ find . -name *.xvi.txt |
I ran into a problem using Springsource Tool Suite when writing some groovy scripts
I ran across a tutorial that had the following syntax: rails generate model User
I ran into a situation where I had the following two implementations located in
Ran into a weird Flex bug (i guess) ... I am uploading using URurlLoader.load(urlRequest)
I ran a query on a MS SQL database using SQL Server Management Studio,
I ran JSLint on some inherited code, and received the following: Problem at line
I ran bundle update on my rails app. I'm not getting the following error
I ran the following code, and it stated is executed correctly. It is the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.