Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7757287
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T13:03:15+00:00 2026-06-01T13:03:15+00:00

Short version: What’s the best hashing algorithm for a multiset implemented as a dictionary

  • 0

Short version: What’s the best hashing algorithm for a multiset implemented as a dictionary of unordered items?

I’m trying to hash an immutable multiset (which is a bag or multiset in other languages: like a mathematical set except that it can hold more than one of each element) implemented as a dictionary. I’ve created a subclass of the standard library class collections.Counter, similar to the advice here: Python hashable dicts, which recommends a hash function like so:

class FrozenCounter(collections.Counter):
    # ...
    def __hash__(self):
        return hash(tuple(sorted(self.items())))

Creating the full tuple of items takes up a lot of memory (relative to, say, using a generator) and hashing will occur in an extremely memory intensive part of my application. More importantly, my dictionary keys (multiset elements) probably won’t be order-able.

I’m thinking of using this algorithm:

def __hash__(self):
    return functools.reduce(lambda a, b: a ^ b, self.items(), 0)

I figure using bitwise XOR means order doesn’t matter for the hash value unlike in the hashing of a tuple? I suppose I could semi-implement the Python tuple-hashing alogrithm on the unordered stream of tuples of my data. See https://github.com/jonashaag/cpython/blob/master/Include/tupleobject.h (search in the page for the word ‘hash’) — but I barely know enough C to read it.

Thoughts? Suggestions? Thanks.


(If you’re wondering why I’m messing around with trying to hash a multiset: The input data for my problem are sets of multisets, and within each set of multisets, each multiset must be unique. I’m working on a deadline and I’m not an experienced coder, so I wanted to avoid inventing new algorithms where possible. It seems like the most Pythonic way to make sure I have unique of a bunch of things is to put them in a set(), but the things must be hashable.)


What I’ve gathered from the comments

Both @marcin and @senderle gave pretty much the same answer: use hash(frozenset(self.items())). This makes sense because items() “views” are set-like. @marcin was first but I gave the check mark to @senderle because of the good research on the big-O running times for different solutions. @marcin also reminds me to include an __eq__ method — but the one inherited from dict will work just fine. This is how I’m implementing everything — further comments and suggestions based on this code are welcome:

class FrozenCounter(collections.Counter):
    # Edit: A previous version of this code included a __slots__ definition.
    # But, from the Python documentation: "When inheriting from a class without
    # __slots__, the __dict__ attribute of that class will always be accessible,
    # so a __slots__ definition in the subclass is meaningless."
    # http://docs.python.org/py3k/reference/datamodel.html#notes-on-using-slots
    # ...
    def __hash__(self):
        "Implements hash(self) -> int"
        if not hasattr(self, '_hash'):
            self._hash = hash(frozenset(self.items()))
        return self._hash
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T13:03:17+00:00Added an answer on June 1, 2026 at 1:03 pm

    Since the dictionary is immutable, you can create the hash when the dictionary is created and return it directly. My suggestion would be to create a frozenset from items (in 3+; iteritems in 2.7), hash it, and store the hash.

    To provide an explicit example:

    >>>> frozenset(Counter([1, 1, 1, 2, 3, 3, 4]).iteritems())
    frozenset([(3, 2), (1, 3), (4, 1), (2, 1)])
    >>>> hash(frozenset(Counter([1, 1, 1, 2, 3, 3, 4]).iteritems()))
    -3071743570178645657
    >>>> hash(frozenset(Counter([1, 1, 1, 2, 3, 4]).iteritems()))
    -6559486438209652990
    

    To clarify why I prefer a frozenset to a tuple of sorted items: a frozenset doesn’t have to sort the items, and so the initial hash completes in O(n) time rather than O(n log n) time. This can be seen from the frozenset_hash and set_next implementations.

    See also this great answer from Raymond Hettinger describing his implementation of the frozenset hash function. There he explicitly explains how the hash function avoids having to sort values to get a stable, order insensitive value.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Short version: I'm trying to determine the best way to track what the user
Short version: I'm wondering if it's possible, and how best, to utilise CPU specific
Short Version: When I've created a Channel using ChannelFactory on a client which uses
Short Version How can I do concatMap in MATLAB? I'm trying to build a
Short version: In trying to deal with a legacy database in Ruby-on-Rails, I've come
Short version: Say I have a string str and a file functions.py from which
Short version: I have a Qt/C++ to which I am having to add a
Short version: What's the best way to override dict.keys() and friends to keep myself
Short version: I want to trigger the Form_Load() event without making the form visible.
Short version: assuming I don't want to keep the data for long, how do

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.