Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6096227
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T12:54:09+00:00 2026-05-23T12:54:09+00:00

I got some vectors containing unsigned chars that represent pixels from a frame. I

  • 0

I got some vectors containing unsigned chars that represent pixels from a frame.
I got this function working without the MMX improvement, but I frustrated whit MMX that doesnt work … So:

I need to add two unsigned chars (the sum need to be done as a 16bit instead of a 8bit cause unsigned char goes from 0-255 as known) and divide them by two (shift right 1). The code I have done so far is below, but the values are wrong, the adds_pu16 doesnt add the 16bit just 8:

  MM0 = _mm_setzero_si64();        //all zeros
  MM1 = TO_M64(lv1+k);             //first 8 unsigned chars
  MM2 = TO_M64(lv2+k);             //second 8 unsigned chars

  MM3 =_mm_unpacklo_pi8(MM0,MM1);  //get first 4chars from MM1 and add Zeros
  MM4 =_mm_unpackhi_pi8(MM0,MM1);  //get last 4chars from MM1 and add Zeros

  MM5 =_mm_unpacklo_pi8(MM0,MM2);  //same as above for line 2
  MM6 =_mm_unpackhi_pi8(MM0,MM2);

  MM1 = _mm_adds_pu16(MM3,MM5);    //add both chars as a 16bit sum (255+255 max range)
  MM2 = _mm_adds_pu16(MM4,MM6);

  MM3 = _mm_srai_pi16(MM1,1);      //right shift (division by 2)
  MM4 = _mm_srai_pi16(MM2,1);

  MM1 = _mm_packs_pi16(MM3,MM4);   //pack the 2 MMX registers into one

  v2 = TO_UCHAR(MM1);              //put results in the destination array

New developments:
Thanks for that king_nak!!
I wrote a simple version of what I am trying to do:


int main()
{
char A[8]={255,155,2,3,4,5,6,7};
char B[8]={255,155,2,3,4,5,6,7};
char C[8];
char D[8];
char R[8];

__m64* pA=(__m64*) A;

__m64* pB=(__m64*) B;

__m64* pC=(__m64*) C;

__m64* pD=(__m64*) D;

__m64* pR=(__m64*) R;

_mm_empty();

__m64 MM0 = _mm_setzero_si64();

__m64 MM1 = _mm_unpacklo_pi8(*pA,MM0);

__m64 MM2 = _mm_unpackhi_pi8(*pA,MM0);

__m64 MM3 = _mm_unpacklo_pi8(*pB,MM0);

__m64 MM4 = _mm_unpackhi_pi8(*pB,MM0);

__m64 MM5 = _mm_add_pi16(MM1,MM3);

__m64 MM6 = _mm_add_pi16(MM2,MM4);

printf("SUM:\n");

*pC= _mm_add_pi16(MM1,MM3);

*pD= _mm_add_pi16(MM2,MM4);

for(int i=0; i<8; i++) printf("\t%d ", (C[i])); printf("\n");

for(int i=0; i<8; i++) printf("\t%d ", D[i]); printf("\n");

printf("DIV:\n");

*pC= _mm_srai_pi16(MM5,1);

*pD= _mm_srai_pi16(MM6,1);

for(int i=0; i<8; i++) printf("\t%d ", (C[i])); printf("\n");

for(int i=0; i<8; i++) printf("\t%d ", D[i]); printf("\n");

MM1= _mm_srai_pi16(MM5,1);    
MM2= _mm_srai_pi16(MM6,1);

printf("Final Result:\n");
*pR= _mm_packs_pi16(MM1,MM2);
for(int i=0; i<8; i++) printf("\t%d ", (R[i])); printf("\n");

return(0);
}

And the results are:

SUM:

-2  1   54  1   4   0   6   0 

8   0   10  0   12  0   14  0 

DIV:

-1  0   -101    0   2   0   3   0 

4   0   5   0   6   0   7   0 

Final Result:

127     127     2   3   4   5   6   7 

Well the small numbers are ok while the big numbers which give 127 are wrong. This is a problem, what am I doing wrong :s

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T12:54:09+00:00Added an answer on May 23, 2026 at 12:54 pm

    I think i found the problem:
    The arguments of the unpack instructions are in the wrong order. If you look at the registers as a whole, it looks like the individual chars are zero-extended to shorts, but in fact, they are zero-padded. Just swap around mm0 and the other register in each case and it should work.

    Also, you don’t need saturated add, a normal PADDW is sufficient. The maximum value you will get is 0xff+0xff=0x01fe, which doesn’t have to be saturated.

    Edit: What’s more, PACKSSWB doesn’t quite do what you want. PACKUSWB is the correct instruction, saturation will get you wrong results.

    Here’s a solution (Also replaced the shifts with logical ones and used different pseudo-registers in some places):

    mm0=pxor(mm0,mm0) =[00,00,00,00,00,00,00,00]
    mm1 =[a0,10,ff,18,7f,f0,ff,cc]
    mm2 =[c0,20,ff,00,70,26,ff,01]
    mm3=punpcklbw(mm1,mm0) =[00a0,0010,00ff,0018]
    mm4=punpckhbw(mm1,mm0) =[007f,00f0,00ff,00cc]
    mm5=punpcklbw(mm2,mm0) =[00c0,0020,00ff,0000]
    mm6=punpckhbw(mm2,mm0) =[0070,0026,00ff,0001]
    mm5=paddw(mm3,mm5) =[0160,0030,01fe,0018]
    mm6=paddw(mm4,mm6) =[00ef,0116,01fe,00cd]
    mm3=psrlwi(mm5,1) =[00b0,0018,00ff,000c]
    mm4=psrlwi(mm6,1) =[0077,008b,00ff,0066]
    mm1=packuswb(mm3,mm4) =[b0,18,ff,0c,77,8b,ff,66]
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I got some legacy code that has this: <?PHP if(isset($_GET['pagina'])==homepage) { ?> HtmlCode1 <?php
I got some legacy code with that caption spread out as comment almost in
I've got some (C#) code that relies on today's date to correctly calculate things
I've got some XML, for example purposes it looks like this: <root> <field1>test</field1> <f2>t2</f2>
We've got some code that uses LoadLibrary and GetProcAddress to implement a plugin architecture
I've got a little objective-c utility program that renders a convex hull. (This is
I've got a method that creates some Foo and adds it to a vector
I got some problems with finding a bug in php/mysql application, and I wonder
I've got some Japanese in the ALT attribute, but the tooltip is showing me
So I've got some C code: #include <stdio.h> #include <string.h> /* putting one of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.