Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 94855
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 10, 20262026-05-10T23:34:01+00:00 2026-05-10T23:34:01+00:00

I’m working on an embedded device that does not support unaligned memory accesses. For

  • 0

I’m working on an embedded device that does not support unaligned memory accesses.

For a video decoder I have to process pixels (one byte per pixel) in 8×8 pixel blocks. The device has some SIMD processing capabilities that allow me to work on 4 bytes in parallel.

The problem is, that the 8×8 pixel blocks aren’t guaranteed to start on an aligned address and the functions need to read/write up to three of these 8×8 blocks.

How would you approach this if you want very good performance? After a bit of thinking I came up with the following three ideas:

  1. Do all memory accesses as bytes. This is the easiest way to do it but slow and it does not work well with the SIMD capabilites (it’s what I’m currently do in my reference C-code).

  2. Write four copy-functions (one for each alignment case) that load the pixel-data via two 32-bit reads, shift the bits into the correct position and write the data to some aligned chunk of scratch memory. The video processing functions can then use 32 bit accesses and SIMD. Drawback: The CPU will have no chance to hide the memory latency behind the processing.

  3. Same idea as above, but instead of writing the pixels to scratch memory do the video-processing in place. This may be the fastest way, but the number of functions that I have to write for this approach is high (around 60 I guess).

Btw: I will have to write all functions in assembler because the compiler generates horrible code when it comes to the SIMD extension.

Which road would you take, or do you have another idea how to approach this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-10T23:34:01+00:00Added an answer on May 10, 2026 at 11:34 pm

    You should first break your code into fetch/processing sections.

    The fetch code should copy into a working buffer and have cases for for memory that is aligned (where you should be able to copy using the SIMD registers) and non-aligned memory where you need to copy byte by byte (if your platform can’t do unaligned access, and your source/dest have different alignments, then this is the best you can do).

    Your processing code can then be SIMD with the guarantee of working on aligned data. For any real degree of processing doing a copy+process will definitely be faster than non-SIMD operations on unaligned data.

    Assuming your source & dest are the same, a further optimization would be to only use the working buffer if the source is unaligned, and do the processing in-place if the memory’s aligned. The benefits of this will depend upon the characteristics of your data.

    Depending on your architecture you may get further benefits by prefetching data before processing. This is where you can issue instructions to fetch areas of memory into the cache before they’re needed, so you would issue a fetch for the next block before processing the current.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 400k
  • Answers 400k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer You might want to look at TextBoxBase.GetLineFromCharIndex method. This method… May 15, 2026 at 4:09 am
  • Editorial Team
    Editorial Team added an answer What about a try/catch block but with multiple catches for… May 15, 2026 at 4:09 am
  • Editorial Team
    Editorial Team added an answer You can add an <error-code> tag for that <error-page> <error-code>404</error-code>… May 15, 2026 at 4:09 am

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.