Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6579517
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T15:54:30+00:00 2026-05-25T15:54:30+00:00

I have defined a subinterface of java.util.Collection that effectively is a multiset (aka bag).

  • 0

I have defined a subinterface of java.util.Collection that effectively is a multiset (aka bag). It may not contain null elements, although that’s not crucial to my question. The equals contract defined by the interface is as you would expect:

  • obj instanceof MyInterface
  • obj contains the same elements as this (by equals)
  • obj contains the same number of duplicates for each element
  • order of elements is disregarded

Now I want to write my hashCode method. My initial idea was:

int hashCode = 1;
for( Object o : this ) {
    hashCode += o.hashCode();
}

However, I noticed that com.google.common.collect.Multiset (from Guava) defines the hash code as follows:

int hashCode = 0;
for( Object o : elementSet() ) {
    hashCode += ((o == null) ? 0 : o.hashCode()) ^ count(o);
}

It strikes me as odd that an empty Multiset would have hash code 0, but more importantly I don’t understand the benefit of ^ count(o) over simply adding up the hash codes of every duplicate. Maybe it’s about not calculating the same hash code more than once, but then why not * count(o)?

My question: what would be an efficient hash code calculation? In my case the count for an element is not guaranteed to be cheap to obtain.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T15:54:30+00:00Added an answer on May 25, 2026 at 3:54 pm

    Update

    Let’s say, for example, in the case where we’d have an array that we want to treat as a multiset.

    So you have to process all the entries as they come, you can’t use count, and can’t assume the entries come in a known order.

    The general function I’d consider is

    int hashCode() {
        int x = INITIAL_VALUE;
        for (Object o : this) {
            x = f(x, o==null ? NULL_HASH : g(o.hashCode()));
        }
        return h(x);
    }
    

    Some observations:

    • As already stated in the other answers, the INITIAL_VALUE doesn’t matter much.
    • I wouldn’t go for NULL_HASH=0 since this would ignore null values.
    • The function g can be used in case you expect the hashes of the members to be in a small range (which can happen in case they are e.g., single characters).
    • The function h can be used to improve the result, which is not very important since this already happens e.g. in HashMap.hash(int).
    • The function f is the most important one, unfortunately, it’s quite limited as it obviously must be both associative and commutative.
    • The function f should be bijective in both arguments, otherwise you’d generate unnecessary collisions.

    In no case I’d recommend f(x, y) = x^y since it’d make two occurrences of an element to cancel out. Using addition is better. Something like

    f(x, y) = x + (2*A*x + 1) * y
    

    where A is a constant satisfies all the above conditions. It may be worth it.
    For A=0 it degenerates to addition, using an even A is not good as it shift bits of x*y out.
    Using A=1 is fine, and the expression 2*x+1 can be computed using a single instruction on the x86 architecture.
    Using a larger odd A might work better in case the hashes of the members are badly distributed.

    In case you go for a non-trivial hashCode() you should test if it works correctly. You should measure the performance of your program, maybe you’ll find simple addition sufficient. Otherwise, I’d for for NULL_HASH=1, g=h=identity, and A=1.

    My old answer

    It may be for efficiency reasons. Calling count may be expensive for some implementations, but entrySet may be used instead. Still it might be more costly, I can’t tell.

    I did a simple collision benchmark for Guava’s hashCode and Rinke’s and my own proposals:

    enum HashCodeMethod {
        GUAVA {
            @Override
            public int hashCode(Multiset<?> multiset) {
                return multiset.hashCode();
            }
        },
        RINKE {
            @Override
            public int hashCode(Multiset<?> multiset) {
                int result = 0;
                for (final Object o : multiset.elementSet()) {
                    result += (o==null ? 0 : o.hashCode()) * multiset.count(o);
                }
                return result;
            }
        },
        MAAARTIN {
            @Override
            public int hashCode(Multiset<?> multiset) {
                int result = 0;
                for (final Multiset.Entry<?> e : multiset.entrySet()) {
                    result += (e.getElement()==null ? 0 : e.getElement().hashCode()) * (2*e.getCount()+123);
                }
                return result;
            }
        }
        ;
        public abstract int hashCode(Multiset<?> multiset);
    }
    

    The collision counting code went as follows:

    private void countCollisions() throws Exception {
        final String letters1 = "abcdefgh";
        final String letters2 = "ABCDEFGH";
        final int total = letters1.length() * letters2.length();
        for (final HashCodeMethod hcm : HashCodeMethod.values()) {
            final Multiset<Integer> histogram = HashMultiset.create();
            for (final String s1 : Splitter.fixedLength(1).split(letters1)) {
                for (final String s2 : Splitter.fixedLength(1).split(letters2)) {
                    histogram.add(hcm.hashCode(ImmutableMultiset.of(s1, s2, s2)));
                }
            }
            System.out.println("Collisions " + hcm + ": " + (total-histogram.elementSet().size()));
        }
    }
    

    and printed

    Collisions GUAVA: 45
    Collisions RINKE: 42
    Collisions MAAARTIN: 0
    

    So in this simple example Guava’s hashCode performed really bad (45 collisions out of 63 possible). However, I don’t claim my example is of much relevance for real life.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have defined a boolean Jess function that accepts a lambda, as follows: (deffunction
I have defined new member in my class protected COMObject.Call call_ = null; This
I have defined a function called login that should return a token. The token
I have defined an object that has several attribute.. class thing(object): def __init__(self, type,
I have defined a macro in ANT that checks if a dir exists on
I have defined one Thickness resource in Window's resources collection which is set to
I have defined this Entity class, but it does not seem to save the
I have defined different states in my lexer, which change not depending on the
I have defined a web service that will return the data from my mysql
I have defined a custom Capistrano task that's supposed to run locally (on my

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.