I’m using C# 4.0. I am performing a bunch of computations on each row of a System.Data.DataTable. I can’t give out the actual code, but it boils down to something like this:
DataTable table = GetMyTableFromSomewhere();
string[] columnNames = table.Columns.Cast<DataColumn>().Select(c => c.ColumnName).ToArray();
foreach (var row in table.Rows.Cast<DataRow>())
{
Dictionary<string, object> values = columnNames.ToDictionary(c => c, c => row[c]);
EvaluateExpressionUsingTheseValues(values);
}
Then EvaluateExpressionUsingTheseValues would access "SomeColumn" via values["SomeColumn"].
My thought is that creating a dictionary inside the loop is resource-intensive. Therefore something like this may be more time efficient:
DataTable table = GetMyTableFromSomewhere();
int rowIndex = -1;
var values = table.Columns.Cast<DataColumn>().Select(c => new
{
Key = c.ColumnName,
Value = new Func<object>(() => table.Rows[rowIndex][c.ColumnName])
}).ToDictionary(kv => kv.Key, kv => kv.Value);
for (rowIndex = 0; rowIndex < table.Rows.Count; rowIndex++)
{
EvaluateExpressionUsingTheseValues(values);
}
And then values["SomeColumn"] in EvaluateExpressionUsingTheseValues would just be values["SomeColumn"](), instead.
I see the first as having heavy per-iteration overhead to build a dictionary, but then fast lookup, whereas the second has no per-iteration overhead in terms of building a dictionary, but then slower lookup.
Which is better?
You’ll find that the solution with
Dictionary<string, Func<object>>has much more overhead.The reason is that these
Func<object>delegates live on closure objects that need to be allocated. The cost of that is probably much higher than simply indexing into your row once.Furthermore, you’ll probably have to do the indexing later anyway. The solution with
Func<object>would then benefit from a cache insideEvaluateExpressionUsingTheseValues(values)to avoid multiple evaluations. But that is what the first solution really already is.