Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8303887
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T17:39:04+00:00 2026-06-08T17:39:04+00:00

Can anyone help me by providing an outline on how the hash function output

  • 0

Can anyone help me by providing an outline on how the hash function output is mapped to bloom filter indices? Here is an overview on bloomfilters.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T17:39:06+00:00Added an answer on June 8, 2026 at 5:39 pm

    an outline on how the hash function output is mapped to a bloom filter indices

    For each of the k hash functions in use, they map onto a bit in the bloom filter just as hashes map onto hash buckets in a hash table. So, very commonly you might have say a hash function generating 32 bit integers, then use the modulus % operator to get a bit index 0 << i < n where n is the number of bits in your bloom filter.

    To make this very concrete, let’s say a hash function generates numbers from 0 to 2^32-1, and there are 1000 bits in your bloom filter:

    int bit_index = hash_function(input_value) % 1000;
    

    It’s important to note here that 2^32-1 is massively greater than 1000. Say the hash function instead generated pretty evenly distributed numbers but only between 0 and 1023 inclusive, then after the modulus operation it’d be twice as likely that bit_index would be in the 0..23 range as compared to 24..999 (because e.g. inputs 2 and 1002 both result in a post-modulus value of 2, but only an input of 25 produces an output of 25). For that reason, if you have a hash function generating 32 bits, you might want to use a bloom filter sized to a number of bits that’s a power of two, then slice out sections of the hash value to use as as if independent hash functions – all explained in the wikipedia article you link. That requires a good quality hash function though, as any “clustering” flaws in the hash function will be passed through unmitigated to the output; having a prime number of bits is one way to mitigate such poor hashing. Still, with good hash functions, powers of two also make it easy to extract bit indices using bitwise AND operations and – if needed – bit shifting, which can be faster than integer modulus, though the hash functions are probably going to dwarf that consideration in the overall performance profile.

    Edit – addressing comments…

    Assuming your MD5 function’s returning an unsigned char* “p” to MD5_DIGEST_LENGTH bytes of data, I suggested you try:

    BOOST_STATIC_ASSERT(MD5_DIGEST_LENGTH >= sizeof(int));
    int bit_index = *reinterpret_cast<unsigned int*>(p) % num_of_bloom_filter_bits;
    

    That was actually a particularly bad idea – sorry – I’ll explain the two reasons why in a moment. First, to answer your question about what it does: BOOST_STATIC_ASSERT() is designed to give you a compile error if the expression it’s passed has evaluated to false. Here, it’s basically a way of documenting the requirement that MD5_DIGEST_LENGTH – which is the size in characters of the textual representation of the MD5 hash – be at least as long as the number of bytes your system uses for an int integer type. (That size is probably 4 bytes, but might be 8.) That requirement is intended to ensure that the reinterpret_cast in the next line is safe. What that does is read a value from the bytes at the start of the textual representation of the MD5 hash as if those bytes contained an int. So, say your int size is 4, MD5 hash is “0cc175b9c0f1b6a831c399e269772661” as in your comment: the first 4 bytes contain “0cc1”. The ASCII codes for that text are 48, 99, 99, 49 decimal. When they’re read into an int, depending on the endianness of the CPU the value could differ, but basically you’ll get one of those numbers times 256^3 plus another one times 256^2 plus a third times 256 plus the final number.

    The reasons I said this was a particularly bad idea are:

    • each character in the MD5 string is either a digit (ASCII codes 48-57) or a letter from “a” through “f” (97-102). Those 16 values are ony a 16th of the variation that a byte can have, and while the int value you generate occupies 32 bits you really only get 2^16 distinct values.
    • on some computers, ints must be aligned at a memory address that’s a multiple of 2, 4, 8 etc.. The reinterpret_cast – if the text happens to start at an incompatible address, could crash your computer. Note: Intel & AMDs have no such alignment requirement, though it may be faster for them to operate on properly aligned data.

    So, another suggestion:

    // create a buffer of the right size to hold a valid unsigned long in hex representation...
    char data[sizeof(unsigned long) * 2 + 1];
    
    // copy as much of the md5 text as will fit into the buffer, NUL terminating it...
    sprintf(data, "%.*s", sizeof data - 1, md5);
    
    // convert to an unsigned long...
    m = strtoul(data, /*endptr*/ NULL, /*base*/ 16);
    

    Here, if the md5 representation was shorter than the data buffer, just the initial part of it would be safely copied, so the BOOST_STATIC_ASSERT isn’t required.

    It’s much easier to use a non-cryptographic hash function, as they’ll generally just return you a number rather than a readble text buffer representation of the number, so you can avoid all this nonsense.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Can anyone help me understand what is going on here? Jenkins has been working
Can anyone help with a function that will parse all urls into valid html
Can anyone help - this is driving me mad. I am calling a mysql
Can Anyone help me why x2 prints zero. I guess because of floating point
Can anyone help me how to split /explode verse format to 3 parts? The
Can anyone help me to get good WordPress interview questions and answers. Any link
Can anyone help me in converting scalar type of openCV to basic types like
can anyone help me for the exercise 12.5 of Jason Hickey's book? Basically, the
Can anyone help me with the script to refresh the page once . I
Can anyone help me find an up-to-date, working ATL project which has a main

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.