I have a 150mb file. Each line is made up of the same format eg/
I,h,q,q,3,A,5,Q,3,[,5,Q,8,c,3,N,3,E,4,F,4,g,4,I,V,9000,0000001-100,G9999999990001800000000000001,G9999999990000001100PDNELKKMMCNELRQNWJ010, , , , , , ,D,Z,
I have a Dictionary<string, List<string>>
It is populated by opening the file, reading each line, taking elements from the line and adding it to the dictionary, then the file is closed.
StreamReader s = File.OpenText(file);
string lineData = null;
while ((lineData = s.ReadLine()) != null)
{
var elements = lineData.Split(',');
var compareElements = elements.Take(24);
FileData.Add(elements[27], new List<string>(compareElements));
}
s.Close();
Using the method in this answer I calculated my dictionary to be 600mb. That’s 4 times what the file is.
Does that sound correct?
Most of these entities take only a single character, yet you are storing them as strings. The reference pointer to those string alone is going to take at least twice as much space (in case of UTF8 likely 4-8 times as much). Then there is the overhead of keeping a hash table structured for the dictionary.
The
List<>in itself should be really efficient storage wise (it uses an array internally)Room for improvement:
List<char>orchar[]instead ofList<string>if you know that the fields will fitstruct Field { char a,b/*,...*/; }and List instead of List if you need more than 1 character per fieldYou could forgo the eager field extraction [<– recommended]:
This gives you the opportunity to access the compareElements on demand:
This is a classic example of runtime/storage cost trade-off
Edit an obvious hybrid would be:
Happily employ Resharper to implement
Equals,GetHashCode,IEquatable<AllCompareElements>,IComparable<AllCompareElements>