Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 44027
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 10, 20262026-05-10T15:35:16+00:00 2026-05-10T15:35:16+00:00

I have 2 arrays of 16 elements (chars) that I need to compare and

  • 0

I have 2 arrays of 16 elements (chars) that I need to ‘compare’ and see how many elements are equal between the two.

This routine is going to be used millions of times (a usual run is about 60 or 70 million times), so I need it to be as fast as possible. I’m working on C++ (C++Builder 2007, for the record)

Right now, I have a simple:

matches += array1[0] == array2[0]; 

repeated 16 times (as profiling it appears to be 30% faster than doing it with a for loop)

Is there any other way that could work faster?

Some data about the environment and the data itself:

  • I’m using C++Builder, which doesn’t have any speed optimizations to take into account. I will try eventually with another compiler, but right now I’m stuck with this one.
  • The data will be different most of the times. 100% equal data is usually very very rare (maybe less than 1%)
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-10T15:35:16+00:00Added an answer on May 10, 2026 at 3:35 pm

    UPDATE: This answer has been modified to make my comments match the source code provided below.

    There is an optimization available if you have the capability to use SSE2 and popcnt instructions.

    16 bytes happens to fit nicely in an SSE register. Using c++ and assembly/intrinsics, load the two 16 byte arrays into xmm registers, and cmp them. This generates a bitmask representing the true/false condition of the compare. You then use a movmsk instruction to load a bit representation of the bitmask into an x86 register; this then becomes a bit field where you can count all the 1’s to determine how many true values you had. A hardware popcnt instruction can be a fast way to count all the 1’s in a register.

    This requires knowledge of assembly/intrinsics and SSE in particular. You should be able to find web resources for both.

    If you run this code on a machine that does not support either SSE2 or popcnt, you must then iterate through the arrays and count the differences with your unrolled loop approach.

    Good luck

    Edit: Since you indicated you did not know assembly, here’s some sample code to illustrate my answer:

    #include 'stdafx.h' #include <iostream> #include 'intrin.h'  inline unsigned cmpArray16( char (&arr1)[16], char (&arr2)[16] ) {     __m128i first = _mm_loadu_si128( reinterpret_cast<__m128i*>( &arr1 ) );     __m128i second = _mm_loadu_si128( reinterpret_cast<__m128i*>( &arr2 ) );      return _mm_movemask_epi8( _mm_cmpeq_epi8( first, second ) ); }  int _tmain( int argc, _TCHAR* argv[] ) {     unsigned count = 0;     char    arr1[16] = { 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0 };     char    arr2[16] = { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0 };      count = __popcnt( cmpArray16( arr1, arr2 ) );      std::cout << 'The number of equivalent bytes = ' << count << std::endl;      return 0; } 

    Some notes: This function uses SSE2 instructions and a popcnt instruction introduced in the Phenom processor (that’s the machine that I use). I believe the most recent Intel processors with SSE4 also have popcnt. This function does not check for instruction support with CPUID; the function is undefined if used on a processor that does not have SSE2 or popcnt (you will probably get an invalid opcode instruction). That detection code is a separate thread.

    I have not timed this code; the reason I think it’s faster is because it compares 16 bytes at a time, branchless. You should modify this to fit your environment, and time it yourself to see if it works for you. I wrote and tested this on VS2008 SP1.

    SSE prefers data that is aligned on a natural 16-byte boundary; if you can guarantee that then you should get additional speed improvements, and you can change the _mm_loadu_si128 instructions to _mm_load_si128, which requires alignment.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 58k
  • Answers 58k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • added an answer Turns out this exact question was answered on the Grails… May 11, 2026 at 8:53 am
  • added an answer It was mentioned in the unset manual's page in 2009:… May 11, 2026 at 8:53 am
  • added an answer Try initializing the WideStrings (s1,s2,s3,s4, and maybe even o). If… May 11, 2026 at 8:53 am

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.