Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7746949
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T10:24:35+00:00 2026-06-01T10:24:35+00:00

When writing PTX in a separate file, a kernel parameter can be loaded into

  • 0

When writing PTX in a separate file, a kernel parameter can be loaded into a register with:

.reg .u32 test;
ld.param.u32 test, [test_param];

However, when using inline PTX, the Using Inline PTX Assembly in CUDA (version 01) application note describes a syntax where loading a parameter is closely linked to another operation. It provides this example:

asm("add.s32 %0, %1, %2;" : "=r"(i) : "r"(j), "r"(k));

Which generates:

ld.s32 r1, [j];
ld.s32 r2, [k];
add.s32 r3, r1, r2;
st.s32 [i], r3;

In many cases, it is necessary to separate the two operations. For instance, one might want to store the parameter in a register outside of a loop and then reuse and modify the register inside a loop. The only way I have found to do this is to use an extra mov instruction, to move the parameter from the register to which it was implicitly loaded, to another register I can use later.

Is there a way to avoid this additional mov instruction when moving from PTX in a separate file to inline PTX?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T10:24:37+00:00Added an answer on June 1, 2026 at 10:24 am

    If I were you I wouldn’t worry too much about those mov operations.

    Keep in mind that PTX is not the final assembly code.
    PTX is further compiled into CUBIN before the kernel launch. Among others, this last step performs register allocation and will remove all unnecessary mov operations.

    In particular, if you move from %r1 to %r2 and then never ever use %r1 at all, the algorithm is likely to assign %r1 and %r2 to the same hardware register and remove the move.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Writing a test app to emulate PIO lines, I have a very simple Python/Tk
Writing code in PHP, how can I ensure the require_once path of a script
Writing some test scripts in IronPython, I want to verify whether a window is
Writing a file utility to strip out all non-ASCII characters from files. I have
writing to a text file is working good...But my problem is text file is
This is my first program using Haskell. I'm writing it to put into practice
Writing some test cases and my mind wanders, assuming there is a better way
From what I understand, CUDA's PTX file is the virtual bytecode that is JIT
Writing some XML documentation for a predicate helper class. But I can't figure out
Writing a rspec test to check the association is valid. My code follows Model

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.