Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6061153
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T08:54:22+00:00 2026-05-23T08:54:22+00:00

I am working in the optimization of an algorithm using SSE2 instructions. But I

  • 0

I am working in the optimization of an algorithm using SSE2 instructions. But I have run into this problem when I was testing the performance:

I) Intel e6750

  1. Doing 4 times the non-SSE2 algorithm takes 14.85 seconds
  2. Doing 1 time the SSE2 algorithm(processes the same data) takes 6.89 seconds

II) Phenom II x4 2.8Ghz

  1. Doing 4 times the non-SSE2 algorithm takes 11.43 seconds
  2. Doing 1 time the SSE2 algorithm(processes the same data) takes 12.15 seconds

Anyone can help me why this is happening? I’m really confused about the results.

In both cases I’m compiling with g++ using -O3 as flag.

PS: The algorithm doesn’t use floating point math, it uses the SSE’s integer instructions.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T08:54:23+00:00Added an answer on May 23, 2026 at 8:54 am

    Intel has made big improvements to their SSE implementation over the last 5 years or so, which AMD has not really kept up with. Originally both were really just 64 bit execution units, and 128 bit operations were broken down into 2 micro-ops. Ever since Core and Core 2 were introduced though, Intel CPUs have had a full 128 bit SSE implementation, which means that 128 bit operations effectively got a 2x throughput boost (1 micro op versus 2). More recent Intel CPUs also have multiple SSE execution units which means you can get > 1 instruction per clock throughput for 128 bit SIMD instructions.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

There are plenty of performance questions on this site already, but it occurs to
I am using working on a c++ application in Visual studio 2008 I have
Working on a problem that requires a GA. I have all of that working
Working on a project at the moment and we have to implement soft deletion
Working in Eclipse on a Dynamic Web Project (using Tomcat (v5.5) as the app
Working with an Oracle 9i database from an ASP.NET 2.0 (VB) application using OLEDB.
Working on a project where a sequential set of methods must be run every
I'm working on a program to solve the n queens problem ( the problem
I'm working on a Sudoku solver at school and we're having a little performance
I have been working lately on a number of iterative algorithms in MATLAB, and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.