I have a CSV file. Each line is made up of the same format eg/
I,h,q,q,3,A,5,Q,3,[,5,Q,8,c,3,N,3,E,4,F,4,g,4,I,V,9000,0000001-100,G9999999990001800000000000001,G9999999990000001100PDNELKKMMCNELRQNWJ010, , , , , , ,D,Z,
I have a Dictionary<string, List<char>>
It is populated by opening the file, reading each line, taking elements from the line and adding it to the dictionary, then the file is closed.
The dictionary is used elsewhere in the program where it accepts input data into the program and then finds the key in the dictionary and uses the 24 elements to compare against the input data.
StreamReader s = File.OpenText(file);
string lineData = null;
while ((lineData = s.ReadLine()) != null)
{
var elements = lineData.Split(',');
//Do stuff with elements
var compareElements = elements.Take(24).Select(x => x[0]);
FileData.Add(elements[27], new List<char>(compareElements));
}
s.Close();
I have just been told that the CSV file will now be 800mb and have roughly 8 million records in it. I have just tried to load this up on my Dual Core Win 32bit laptop with 4GB of RAM in debug and it threw a OutOfMemoryException.
I am now thinking that not loading the file into memory will be the best bet but need to find a way to search the file quickly to see if the input data has a matching item equal to element[27] and then take the first 24 elements in that CSV and compare it to the input data.
a) Even if I stuck with this approach and used 16GB RAM and Windows 64bit would having that many items in a dictionary be ok?
b) Could you provide some code/links to ways to search a CSV file quickly if you dont think using a dictionary is a good plan
UPDATE: Although I have accepted an answer, I just wondered what people’s thoughts were on using FileStream to do a lookup and then extract data.
If you ‘re planning to search this many records, I would suggest bulk inserting the file into a DBMS like SQL Server with appropriate indices for the fields that will be your criteria, and then using an SQL query to check for the existence of a record.