I have been running into OutOfMemory Exceptions while trying to load an 800MB text file into a DataTable via StreamReader. I was wondering if there a way to load the DataTable from the memory stream in batches, ie, read the first 10,000 rows of the text file from StreamReader, create DataTable, do something with DataTable, then load the next 10,000 rows into the StreamReader and so on.
My googles weren’t very helpful here, but it seems like there should be an easy way to do this. Ultimately I will be writing the DataTables to an MS SQL db using SqlBulkCopy so if there is an easier approach than what I have described, I would be thankful for a quick pointer in the right direction.
Edit – Here is the code that I am running:
public static DataTable PopulateDataTableFromText(DataTable dt, string txtSource)
{
StreamReader sr = new StreamReader(txtSource);
DataRow dr;
int dtCount = dt.Columns.Count;
string input;
int i = 0;
while ((input = sr.ReadLine()) != null)
{
try
{
string[] stringRows = input.Split(new char[] { '\t' });
dr = dt.NewRow();
for (int a = 0; a < dtCount; a++)
{
string dataType = dt.Columns[a].DataType.ToString();
if (stringRows[a] == "" && (dataType == "System.Int32" || dataType == "System.Int64"))
{
stringRows[a] = "0";
}
dr[a] = Convert.ChangeType(stringRows[a], dt.Columns[a].DataType);
}
dt.Rows.Add(dr);
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
i++;
}
return dt;
}
And here is the error that is returned:
“System.OutOfMemoryException: Exception of type ‘System.OutOfMemoryException’ was thrown.
at System.String.Split(Char[] separator, Int32 count, StringSplitOptions options)
at System.String.Split(Char[] separator}
at Harvester.Config.PopulateDataTableFromText(DataTable dt, String txtSource) in C:….”
Regarding the suggestion to load the data directly into SQL – I’m a bit of a noob when it comes to C# but I thought that is basically what I am doing? SqlBulkCopy.WriteToServer takes the DataTable that I create from the text file and imports it to sql. Is there an even easier way to do this that I am missing?
Edit: Oh, I forgot to mention – this code will not be running on the same server as the SQL Server. The Data text file is on Server B and needs to be written to table in Server A. Does that preclude using bcp?
Do you actually need to process the data by batches of rows ? Or could you process it row by row ? In the latter case, I think Linq could be very helpful here, because it makes it easy to stream data across a “pipeline” of methods. That way you don’t need to load a lot of data at once, only one row at a time
First, you need to make your
StreamReaderenumerable. This is easily done with an extension method:That way you can use the
StreamReaderas the source for a Linq query.Then you need a method that takes a string and converts it to a
DataRow:With those elements, you can easily project each line from the file to a DataRow, and do whatever you need with it:
(note that you could do something similar with a simple loop, without using Linq, but I think Linq makes the code more readable…)