Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 886593
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T13:05:40+00:00 2026-05-15T13:05:40+00:00

I’m very new to SIMD/SSE and I’m trying to do some simple image filtering

  • 0

I’m very new to SIMD/SSE and I’m trying to do some simple image filtering (blurring).
The code below filters each pixel of a 8-bit gray bitmap with a simple [1 2 1] weighting in horizontal direction. I’m creating sums of 16 pixels at a time.

What seems very bad about this code, at least to me, is that there is a lot of insert/extract in it, which is not very elegant and probably slows everything down as well. Is there a better way to wrap data from one reg into another when shifting?

buf is the image data, 16-byte aligned.
w/h are width and height, multiples of 16.

__m128i *p = (__m128i *) buf;
__m128i cur1, cur2, sum1, sum2, zeros, tmp1, tmp2, saved;
zeros = _mm_setzero_si128();
short shifted, last = 0, next;

// preload first row
cur1 = _mm_load_si128(p);
for (x = 1; x < (w * h) / 16; x++) {
    // unpack
    sum1 = sum2 = saved = cur1;
    sum1 = _mm_unpacklo_epi8(sum1, zeros);
    sum2 = _mm_unpackhi_epi8(sum2, zeros);
    cur1 = tmp1 = sum1;
    cur2 = tmp2 = sum2;
    // "middle" pixel
    sum1 = _mm_add_epi16(sum1, sum1);
    sum2 = _mm_add_epi16(sum2, sum2);
    // left pixel
    cur2 = _mm_slli_si128(cur2, 2);
    shifted = _mm_extract_epi16(cur1, 7);
    cur2 = _mm_insert_epi16(cur2, shifted, 0);
    cur1 = _mm_slli_si128(cur1, 2);
    cur1 = _mm_insert_epi16(cur1, last, 0);
    sum1 = _mm_add_epi16(sum1, cur1);
    sum2 = _mm_add_epi16(sum2, cur2);
    // right pixel
    tmp1 = _mm_srli_si128(tmp1, 2);
    shifted = _mm_extract_epi16(tmp2, 0);
    tmp1 = _mm_insert_epi16(tmp1, shifted, 7);
    tmp2 = _mm_srli_si128(tmp2, 2);
    // preload next row
    cur1 = _mm_load_si128(p + x);
    // we need the first pixel of the next row for the "right" pixel
    next = _mm_extract_epi16(cur1, 0) & 0xff;
    tmp2 = _mm_insert_epi16(tmp2, next, 7);
    // and the last pixel of last row for the next "left" pixel
    last = ((uint16_t) _mm_extract_epi16(saved, 7)) >> 8;
    sum1 = _mm_add_epi16(sum1, tmp1);
    sum2 = _mm_add_epi16(sum2, tmp2);
    // divide
    sum1 = _mm_srli_epi16(sum1, 2);
    sum2 = _mm_srli_epi16(sum2, 2);
    sum1 = _mm_packus_epi16(sum1, sum2);
    mm_store_si128(p + x - 1, sum1);
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T13:05:41+00:00Added an answer on May 15, 2026 at 1:05 pm

    I suggest keeping the neighbouring pixels on the SSE register. That is, keep the result of the _mm_slli_si128 / _mm_srli_si128 in an SSE variable, and eliminate all of the insert and extract. My reasoning is that in older CPUs, the insert/extract instructions require communication between the SSE units and the general-purpose units, which is much slower than keeping the calculation within SSE, even if it spills over to the L1 cache.

    When that is done, there should be only four 16-bit shifts ( _mm_slli_si128, _mm_srli_si128, not counting the divison shift ). My suggestion is to do a benchmark with your code, because by that time your code may have already hit the memory bandwidth limit .. which means you can’t optimize anymore.

    If the image is large (bigger than L2 size) and the output won’t be read back soon, try use MOVNTDQ ( _mm_stream_si128 ) for writing back. According to several websites it is in SSE2, although you might want to double-check.

    SIMD tutorial:

    • http://www.tommesani.com/Docs.html
    • http://en.wikipedia.org/wiki/X86_instruction_listings#SSE2_instructions

    Some SIMD guru websites:

    • http://www.agner.org/optimize/
    • http://www.virtualdub.org/
    • http://www.pixelglow.com/
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have just tried to save a simple *.rtf file with some websites and
link Im having trouble converting the html entites into html characters, (&# 8217;) i
Seemingly simple, but I cannot find anything relevant on the web. What is the
Does anyone know how can I replace this 2 symbol below from the string
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I ran into a problem. Wrote the following code snippet: teksti = teksti.Trim() teksti
I have a jquery bug and I've been looking for hours now, I can't
this is what i have right now Drawing an RSS feed into the php,
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I want to count how many characters a certain string has in PHP, but

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.