I have a rather general question regarding the recognition speed of any dictionary using a string as a key and couldn’t find an answer so far.
Within my current program I have a dictionary of custom objects but the keys I use are filenames including the whole path of the file such that no key can actually occur twice.
My question is: Does the time to find the specific object within the dictionary significantly depend on the length of the string used as a key? Afterall, if I have a big amount of data saved within my object and I use that data in a loop and access the data every single time by using myDictionary[Key]. The simple recognition might take a long time, making the loops last for a longer time.
My solution to this problem would be: In case of using an array, lets say double[,,] within my object, I temporarily create a new array and set this one equal to the one within the dictionary, so I don’t have to search through the dictionary for every single loop iteration.
Yes it does. Finding an element in a dictionary is done with two CPU intensive steps:
A dictionary stores elements in buckets. To be able to do an O(1) lookup, the dictionary calculates the position in the internal array using
hashCode modulo array.Length. This can result in elements with the same index. Those elements are stored under the same index; which is called a bucket.For a string, the hash code is generated using all characters in the string, which mean that the generation of the string’s hash code has a performance characteristic of O(n). When the string is big, it takes longer to generate the hash code. Comparing to strings for equality is done by comparing two strings completely. If these strings contain, let’s say, 100,000 characters and only the last character differs, comparing the two strings can take quite a lot of time. If they differ with the first character, the comparison will return false very quickly. Determining that two strings are in fact equal (if they aren’t reference equal) takes the most time, since the complete string needs to be traversed.
If you can, make key strings short if the dictionary is in a performance critical path of the application.