Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7536815
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T06:40:45+00:00 2026-05-30T06:40:45+00:00

I am new to assembler and NEON programming. My task is to convert part

  • 0

I am new to assembler and NEON programming.
My task is to convert part of an algorithm from C to ARM Assembler using NEON instructions.
The algorithm takes an int32 array, loads different values from this array, does some bitshifting and Xor and writes the result in another array.
Later I will use an array with 64bit values, but for now i just try to rewrite the code.

C Pseudo code:

out_array[index] = shiftSome( in_array[index] ) ^ shiftSome( in_array[index] );

So here are my questions regarding NEON Instructions:

1.) If i load a register like this:

vld1.32 d0, [r1]

will it load only 32Bit from the memory or 2x32Bit to fill the 64Bit Neon D-Register?

2.) How can I access the 2/4/8 (i32, i16, i8) parts of the D-Register?

3.) I am trying to load different values from the array with an offset, but it doesn’t
seem to work…what am I doing wrong… here is my code:
(it is an integer array so I´m trying to load for example the 3-element, which should have an offset of 64Bit = 8 Byte)

asm volatile(
"vld1.32 d0, [%0], #8 \n"     
"vst1.32 d0, [%1]" : : "r" (a), "r" (out): "d0", "r5");

where “a” is the array and “out” is an pointer to an integer (for debugging).

4.) After I load a value from the array I need to shift it to the right but it doesn’t seem to work:

vshr.u32 d0, d0, #24     // C code:   x >> 24;

5.) Is it possible to only load 1 Byte in a Neon register so that I don’t have to shift/mask something to get only the one Byte i need?

6.) I need to use Inline assembler, but I am not sure what the last line is for:

input list : output list : what is this for?

7.) Do you know any good NEON References with code examples?

The Program should run on an Samsung Galaxy S2, cortex-A9 Processor if that makes any difference. Thanks for the help.

—————-edit——————-

That is what i found out:

  1. It will always load the full Register (64Bit)
  2. You can use the “vmov” instruction to transfer part of a neon register to an arm register.
  3. The offset should be in an arm register and will be added to the
    base address after the memory access.
  4. It is the “clobbered reg list”. Every Register that is used and
    neither in the input or output list, should be written here.
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T06:40:47+00:00Added an answer on May 30, 2026 at 6:40 am

    I can answer most of your questions: (update: clarified “lane” issue)

    1) NEON instructions can only load and store entire registers (64-bit, 128-bit) at a time to and from memory. There is a MOV instruction variant that allows single “lanes” to be moved to or from ARM registers.

    2) You can use the NEON MOV instruction to affect single lanes. Performance will suffer when doing too many single element operations. NEON instructions benefit application performance by doing parallel operations on vectors (groups of floats/ints).

    3) The immediate value offsets in ARM assembly language are bytes, not elements/registers. NEON instructions allow post increment with a register, not immediate value. For normal ARM instructions, your post-increment of 8 will add 8 (bytes) to the source pointer.

    4) Shifts in NEON affect all elements of a vector. A shift right of 24 bits using vshr.u32 will shift both 32-bit unsigned longs by 24 bits and throw away the bits that get shifted out.

    5) NEON instructions allow moving single elements in and out of normal ARM registers, but don’t allow loads or stores from memory directly into “lanes”.

    6) ?

    7) Start here: http://blogs.arm.com/software-enablement/161-coding-for-neon-part-1-load-and-stores/ The ARM site has a good tutorial on NEON.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm a bit new to assembler, but I'm trying to lookup the parameters from
New to javascript/jquery and having a hard time with using this or $(this) to
I'm fairly new to Linux (Ubuntu 10.04) and a total novice to assembler. I
I'm designing and currently rethinking a low-level interpreted programming language with similarities to assembler.
I got some assembler code that creates a formatted string using sprintf() : ...
My assignment is to take an array of numbers and put it into ARM
I'm very new to the concept of writing an assembler and even after reading
Is it possible to use the new SSE registers from Visual Studio 2010 inline
I am using Moq (which I am very new too as well as TDD
I am chasing an exception which is thrown from a part of code which

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.