Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6966147
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T16:09:15+00:00 2026-05-27T16:09:15+00:00

I am having a problem with hash collisions using short strings in .NET4. EDIT:

  • 0

I am having a problem with hash collisions using short strings in .NET4.
EDIT: I am using the built-in string hashing function in .NET.

I’m implementing a cache using objects that store the direction of a conversion like this

public class MyClass
{
    private string _from;
    private string _to;

   // More code here....

    public MyClass(string from, string to)
    {
        this._from = from;
        this._to = to;
    }

    public override int GetHashCode()
    {
        return string.Concat(this._from, this._to).GetHashCode();
    }

    public bool Equals(MyClass other)
    {
        return this.To == other.To && this.From == other.From;
    }

    public override bool Equals(object obj)
    {
        if (obj == null) return false;
        if (this.GetType() != obj.GetType()) return false;
        return Equals(obj as MyClass);
    }
}

This is direction dependent and the from and to are represented by short strings like “AAB” and “ABA”.

I am getting sparse hash collisions with these small strings, I have tried something simple like adding a salt (did not work).

The problem is that too many of my small strings like “AABABA” collides its hash with the reverse of “ABAAAB” (Note that these are not real examples, I have no idea if AAB and ABA actually cause collisions!)

and I have gone heavy duty like implementing MD5 (which works, but is MUCH slower)

I have also implemented the suggestion from Jon Skeet here:
Should I use a concatenation of my string fields as a hash code?
This works but I don’t know how dependable it is with my various 3-character strings.

How can I improve and stabilize the hashing of small strings without adding too much overhead like MD5?

EDIT: In response to a few of the answers posted… the cache is implemented using concurrent dictionaries keyed from MyClass as stubbed out above. If I replace the GetHashCode in the code above with something simple like @JonSkeet ‘s code from the link I posted:

int hash = 17;
hash = hash * 23 + this._from.GetHashCode();
hash = hash * 23 + this._to.GetHashCode();        
return hash;

Everything functions as expected.
It’s also worth noting that in this particular use-case the cache is not used in a multi-threaded environment so there is no race condition.

EDIT: I should also note that this misbehavior is platform dependant. It works as intended on my fully updated Win7x64 machine but does not behave properly on a non-updated Win7x64 machine. I don’t know the extend of what updates are missing but I know it doesn’t have Win7 SP1… so I would assume there may also be a framework SP or update it’s missing as well.

EDIT: As susggested, my issue was not caused by a problem with the hashing function. I had an elusive race condition, which is why it worked on some computers but not others and also why a “slower” hashing method made things work properly. The answer I selected was the most useful in understanding why my problem was not hash collisions in the dictionary.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T16:09:15+00:00Added an answer on May 27, 2026 at 4:09 pm

    Are you sure that collisions are who causes problems? When you say

    I finally discovered what was causing this bug

    You mean some slowness of your code or something else? If not I’m curious what kind of problem is that? Because any hash function (except “perfect” hash functions on limited domains) would cause collisions.

    I put a quick piece of code to check for collisions for 3-letter words. And this code doesn’t report a single collision for them. You see what I mean? Looks like buid-in hash algorithm is not so bad.

    Dictionary<int, bool> set = new Dictionary<int, bool>();
    char[] buffer = new char[3];
    int count = 0;
    for (int c1 = (int)'A'; c1 <= (int)'z'; c1++)
    {
        buffer[0] = (char)c1;
        for (int c2 = (int)'A'; c2 <= (int)'z'; c2++)
        {
            buffer[1] = (char)c2;
            for (int c3 = (int)'A'; c3 <= (int)'z'; c3++)
            {
                buffer[2] = (char)c3;
                string str = new string(buffer);
                count++;
                int hash = str.GetHashCode();
                if (set.ContainsKey(hash))
                {
                    Console.WriteLine("Collision for {0}", str);
                }
                set[hash] = false;
            }
        }
    }
    
    Console.WriteLine("Generated {0} of {1} hashes", set.Count, count);
    

    While you could pick almost any of well-known hash functions (as David mentioned) or even choose a “perfect” hash since it looks like your domain is limited (like minimum perfect hash)… It would be great to understand if the source of problems are really collisions.

    Update

    What I want to say is that .NET build-in hash function for string is not so bad. It doesn’t give so many collisions that you would need to write your own algorithm in regular scenarios. And this doesn’t depend on the lenght of strings. If you have a lot of 6-symbol strings that doesn’t imply that your chances to see a collision are highier than with 1000-symbol strings. This is one of the basic properties of hash functions.

    And again, another question is what kind of problems do you experience because of collisions? All build-in hashtables and dictionaries support collision resolution. So I would say all you can see is just… probably some slowness. Is this your problem?

    As for your code

    return string.Concat(this._from, this._to).GetHashCode(); 
    

    This can cause problems. Because on every hash code calculation you create a new string. Maybe this is what causes your issues?

    int hash = 17; 
    hash = hash * 23 + this._from.GetHashCode(); 
    hash = hash * 23 + this._to.GetHashCode();         
    return hash; 
    

    This would be much better approach – just because you don’t create new objects on the heap. Actually it’s one of the main points of this approach – get a good hash code of an object with a complex “key” without creating new objects. So if you don’t have a single value key then this should work for you. BTW, this is not a new hash function, this is just a way to combine existing hash values without compromising main properties of hash functions.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm having problem restricting a query in mdx, using except function at where clause.
I'm having a problem with this function that traverses a Hash. The Hash may
I am having problem in getting reference of Ajax control extender using $find() or
I'm having a problem with MessageDigest returning different hash values on different computers. One
I am having the problem that the hash sign is truncated. Does anybody know
I am having a problem with encoding the hash for the Version 2 Signature
I'm having a problem with making a sha1-hash of a row in a select
I'm having a problem correctly hashing my objects. Consider the following code: class Foo:
I am working on a system where hash collisions would be a problem. Essentially
I am having problem updating table cell value using jQuery 1.4.2. it all works

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.