I read that initializing a dictionary with initial capacity may lead to better performance if the size could be estimated.
Dim MyDataTable As New DataTable
'Fill MyDataTable from Database
Dim CapacityEstimate As integer = MyDataTable.Rows.Count
Dim MyDictionary As New Dictionary(Of String, MyObjectType)(CapacityEstimate)
'Fill the Dictionary independent of data table
The CapacityEstimate variable is just an estimate (generally in the range of 2500 to 7000) of the number of Key/Value pairs that the dictionary should hold. So if I estimate it to be 4000 and end up with 4010 objects (I may go over or under, not certain) would the dictionary use a lot of memory to resize with those many key/value pairs already in it. Is this a good solution or am I better off not initializing it with initial capacity. Thanks.
EDIT: Related but not the same – Should a .NET generic dictionary be initialised with a capacity equal to the number of items it will contain?
Don’t worry about the small stuff. A dictionary like that does not use a lot of memory so having it resize itself can’t take a lot of memory either. The real storage is the objects for the key and data, the dictionary only contains references to them. 8 bytes per entry in 32-bit mode so that’s only 4000 x 8 + some overhead = 32 kilobytes.
Furthermore, the capacity you pass is used to calculate the number of hash buckets in the dictionary. Which is always a prime number equal or larger than the capacity you specified. The prime numbers are picked from this array (copied from the Reference Source):
So if you pass 4000 then you’ll actually get 4049 buckets, the next largest prime. So overshooting to 4010 isn’t going to make a difference. If it does need to resize then it doubles the capacity. So a single resize would already produce 8419 buckets, well past you max estimate. Resizing isn’t very expensive either, couple of microseconds. Which is why Andre couldn’t see a difference.
Which is, other than reasoning about it, the proper approach. Measure. Anybody can measure.