Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6061153
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T08:54:22+00:00 2026-05-23T08:54:22+00:00

I am working in the optimization of an algorithm using SSE2 instructions. But I

  • 0

I am working in the optimization of an algorithm using SSE2 instructions. But I have run into this problem when I was testing the performance:

I) Intel e6750

  1. Doing 4 times the non-SSE2 algorithm takes 14.85 seconds
  2. Doing 1 time the SSE2 algorithm(processes the same data) takes 6.89 seconds

II) Phenom II x4 2.8Ghz

  1. Doing 4 times the non-SSE2 algorithm takes 11.43 seconds
  2. Doing 1 time the SSE2 algorithm(processes the same data) takes 12.15 seconds

Anyone can help me why this is happening? I’m really confused about the results.

In both cases I’m compiling with g++ using -O3 as flag.

PS: The algorithm doesn’t use floating point math, it uses the SSE’s integer instructions.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T08:54:23+00:00Added an answer on May 23, 2026 at 8:54 am

    Intel has made big improvements to their SSE implementation over the last 5 years or so, which AMD has not really kept up with. Originally both were really just 64 bit execution units, and 128 bit operations were broken down into 2 micro-ops. Ever since Core and Core 2 were introduced though, Intel CPUs have had a full 128 bit SSE implementation, which means that 128 bit operations effectively got a 2x throughput boost (1 micro op versus 2). More recent Intel CPUs also have multiple SSE execution units which means you can get > 1 instruction per clock throughput for 128 bit SIMD instructions.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have been working on SSE optimization for a video processing algorithm recently. I
I'm working on a student project team building application. I'm familiar with optimization but
I have been successfully working with the Haar algorithm in OpenCV-2.1.0 (cvHaarDetectObjects) to detect
I have a constraint problem I've been working on, which has a couple fun
Working on a problem that requires a GA. I have all of that working
I am working with Wolfram Mathematica 8 and have the following problem. I have
There are plenty of performance questions on this site already, but it occurs to
I'm working on the implementation of Artificial Bee Colony algorithm in optimization of fuzzy
I have a javascript that takes about 2 seconds to execute (complex optimization algorithm).
I am using working on a c++ application in Visual studio 2008 I have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.