I have a 150mb file. Each line is made up of the same format

Question

0

Asked: May 26, 20262026-05-26T22:07:50+00:00 2026-05-26T22:07:50+00:00

I have a 150mb file. Each line is made up of the same format

0

I have a 150mb file. Each line is made up of the same format eg/

I,h,q,q,3,A,5,Q,3,[,5,Q,8,c,3,N,3,E,4,F,4,g,4,I,V,9000,0000001-100,G9999999990001800000000000001,G9999999990000001100PDNELKKMMCNELRQNWJ010, , , , , , ,D,Z,

I have a Dictionary<string, List<string>>

It is populated by opening the file, reading each line, taking elements from the line and adding it to the dictionary, then the file is closed.

StreamReader s = File.OpenText(file);
 string lineData = null;
 while ((lineData = s.ReadLine()) != null)
 {
   var elements = lineData.Split(',');
   var compareElements = elements.Take(24);
   FileData.Add(elements[27], new List<string>(compareElements));

  }
  s.Close();

Using the method in this answer I calculated my dictionary to be 600mb. That’s 4 times what the file is.

Does that sound correct?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T22:07:51+00:00

Most of these entities take only a single character, yet you are storing them as strings. The reference pointer to those string alone is going to take at least twice as much space (in case of UTF8 likely 4-8 times as much). Then there is the overhead of keeping a hash table structured for the dictionary.

The List<> in itself should be really efficient storage wise (it uses an array internally)

Room for improvement:

you could use List<char> or char[] instead of List<string> if you know that the fields will fit
you could use struct Field { char a,b/*,...*/; } and List instead of List if you need more than 1 character per field
You could forgo the eager field extraction [<– recommended]:
```
 var dict = File.ReadAllLines(file)
      .ToDictionary(line => line.Split(',')[27]);
```
This gives you the opportunity to access the compareElements on demand:
```
 string[] compareElements = dicts["key27"].Split(',')/*.Take(24).ToArray()*/;
```
This is a classic example of runtime/storage cost trade-off

Edit an obvious hybrid would be:

struct AllCompareElements
{
     public char field1, field2, ... field24;
     // perhaps:
     public char[2] field13; // for the exceptional field that is longer than 1 character
}

Happily employ Resharper to implement Equals, GetHashCode, IEquatable<AllCompareElements>, IComparable<AllCompareElements>

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a 150mb file. Each line is made up of the same format

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply