Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7034459
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T01:11:18+00:00 2026-05-28T01:11:18+00:00

Bitcounting can be done in several ways, eg. with set bit iterator, unset bit

  • 0

Bitcounting can be done in several ways, eg. with set bit iterator, unset bit iterator, pre-computed bits with lookup tables or parallel counting. As I have figured out by searching the web, unset bit iterator is fast when there are less unset bits, and set bit iterator the opposite. But when should you use parallel counting, MIT HAKMEM (seen below) in particular? It seems quite fast, although probably slower then lookup tables. Is it always better compared to set/unset bit in terms of speed? Are there some other conserns regarding which one to choose than speed and memory?

 int BitCount(unsigned int u) {
     unsigned int uCount;

     uCount = u - ((u >> 1) & 033333333333) - ((u >> 2) & 011111111111);
     return ((uCount + (uCount >> 3)) & 030707070707) % 63;
 }
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T01:11:19+00:00Added an answer on May 28, 2026 at 1:11 am

    Why choose one bit counting method over another? Well it really depends on your machine and the problem you’re trying to solve. Note that all the instruction counts I give below are for a basic RISC processor and might not translate well to a more complicated beast like x86.

    The HAKMEM algorithm you quoted will execute in 13 instructions but is unlikely to be very fast due to the modulus operator. By eye-balling it, it does look like it has some pretty good instruction level parallelism which should help if your processor is capable of exploiting that.

    The algorithm Bo Persson presented is quite fast (2 + 5*pop(x) instructions) but only if the word is sparsely populated. It can also be modified to work on densely populated words. It also contain branches and doesn’t have any significant instruction level parallelism.

    EDIT: The table lookup method can also be very fast but does make memory accesses. If the entire table is in the L1 cache then it’s probably one of the fastest algorithms. If the table isn’t in cache then it’s almost certainly one of the slowest.

    The algorithm below is a variation of one of the HAKMEM algorithm and is presented in the book Hacker’s Delight (I highly recommend this book if you like this sort of things). It executes in 19 instructions and is branch-free. It also doesn’t use a division but does have a multiplication. It’s also very economical in the way it uses registers by re-using the same mask as much as possible. Still no significant instruction level parallelism here that I can see.

    int pop(unsigned x) {
      unsigned n;
    
      n = (x >> 1) & 0x77777777;
      x = x - n;
      n = (n >> 1) & 0x77777777;
      x = x - n;
      n = (n >> 1) & 0x77777777;
      x = x - n;
      x = (x + (x >> 4)) & 0x0F0F0F0F;
      x = x * 0x01010101;
      return x >> 24;
    }
    

    The Hacker’s Delight book also presents a couple of even more specialised algorithms for 9-8-7 bit wide fields or using floating point operators. Note that most of the analysis I presented above were also partially taken from that book as well.

    The fact is that there’s a truck load of methods and the only way to be sure which works best in your particular situation is to measure and compare. I do realise that this is a pretty canned answer but the alternative is to know your processor and compiler inside out.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Given a MATLAB uint32 to be interpreted as a bit string, what is an
I have a bit counting method that I am trying to make as fast

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.