Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4111696
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T22:05:59+00:00 2026-05-20T22:05:59+00:00

I have a need for a high-performance string hashing function in python that produces

  • 0

I have a need for a high-performance string hashing function in python that produces integers with at least 34 bits of output (64 bits would make sense, but 32 is too few). There are several other questions like this one on Stack Overflow, but of those every accepted/upvoted answer I could find fell in to one of a few categories, which don’t apply (for the given reason.)

  • Use the built-in hash() function. This function, at least on the machine I’m developing for (with python 2.7, and a 64-bit cpu) produces an integer that fits within 32 bits – not large enough for my purposes.
  • Use hashlib. hashlib provides cryptographic hash routines, which are far slower than they need to be for non-cryptographic purposes. I find this self-evident, but if you require benchmarks and citations to convince you of this fact then I can provide that.
  • Use the string.__hash__() function as a prototype to write your own function. I suspect this will be the correct way to go, except that this particular function’s efficiency lies in its use of the c_mul function, which wraps around 32 bits – again, too small for my use! Very frustrating, it’s so close to perfect!

An ideal solution would have the following properties, in a relative, loose order of importance.

  1. Have an output range extending at least 34 bits long, likely 64 bits, while preserving consistent avalanche properties over all bits. (Concatenating 32-bit hashes tends to violate the avalanche properties, at least with my dumb examples.)
  2. Portable. Given the same input string on two different machines, I should get the same result both times. These values will be stored in a file for later re-use.
  3. High-performance. The faster the better as this function will get called roughly 20 billion times during the execution of the program I’m running (it is the performance-critical code at the moment.) It doesn’t need to be written in C, it really just needs to outperform md5 (somewhere in the realm of the built-in hash() for strings).
  4. Accept a ‘perturbation’ (what’s the better word to use here?) integer as input to modify the output. I put an example below (the list formatting rules wouldn’t let me place it nearer.) I suppose this isn’t 100% necessary since it can be simulated by perturbing the output of the function manually, but having it as input gives me a nice warm feeling.
  5. Written entirely in Python. If it absolutely, positively needs to be written in C then I guess that can be done, but I’d take a 20% slower function written in python over the faster one in C, just due to project coordination headache of using two different languages. Yes, this is a cop-out, but this is a wish list here.

‘Perturbed’ hash example, where the hash value is changed drastically by a small integer value n

def perturb_hash(key,n):
    return hash((key,n))

Finally, if you’re curious as to what the heck I’m doing that I need such a specific hash function, I’m doing a complete re-write of the pybloom module to enhance its performance considerably. I succeeded at that (it now runs about 4x faster and uses about 50% of the space) but I noticed that sometimes if the filter got large enough it was suddenly spiking in false-positive rates. I realized it was because the hash function wasn’t addressing enough bits. 32 bits can only address 4 billion bits (mind you, the filter addresses bits and not bytes) and some of the filters I’m using for genomic data double that or more (hence 34 bit minimum.)

Thanks!

  • 1 1 Answer
  • 2 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T22:06:00+00:00Added an answer on May 20, 2026 at 10:06 pm

    Take a look at the 128-bit variant of MurmurHash3. The algorithm’s page includes some performance numbers. Should be possible to port this to Python, pure or as a C extension. (Updated the author recommends using the 128-bit variant and throwing away the bits you don’t need).

    If MurmurHash2 64-bit works for you, there is a Python implementation (C extension) in the pyfasthash package, which includes a few other non-cryptographic hash variants, though some of these only offer 32-bit output.

    Update I did a quick Python wrapper for the Murmur3 hash function. Github project is here and you can find it on Python Package Index as well; it just needs a C++ compiler to build; no Boost required.

    Usage example and timing comparison:

    import murmur3
    import timeit
    
    # without seed
    print murmur3.murmur3_x86_64('samplebias')
    # with seed value
    print murmur3.murmur3_x86_64('samplebias', 123)
    
    # timing comparison with str __hash__
    t = timeit.Timer("murmur3.murmur3_x86_64('hello')", "import murmur3")
    print 'murmur3:', t.timeit()
    
    t = timeit.Timer("str.__hash__('hello')")
    print 'str.__hash__:', t.timeit()
    

    Output:

    15662901497824584782
    7997834649920664675
    murmur3: 0.264422178268
    str.__hash__: 0.219163894653
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have some high-performance C++ that I need to interface with Objective-C, is there
I need to do alot of high-performance case-insensitive string comparisons and realized that my
I have need for a function pointer that takes two arguments and returns a
I need to have very high-performance loop going over large datasets. I need to
I have a need to build the data string dynamically. This is not working,
I have a need for two slightly different classes, that have the same members,
I have a need to write a GTK application in C that does some
I'm working on a system that requires high file I/O performance (with C#). Basically,
I need to deploy a high performance number crunching service in Windows Azure. Are
We have a very high performance multitasking, near real-time C# application. This performance was

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.