I’m looking at an old web app I wrote and it is taking about an hour to read 4500 records from a DataTable so it can write them to a CSV file. I feel there has to be some way to improve this.
Few things to note:
-
The DataTable contains… 376 columns
At least, I think that’s what Excel’s
NLcolumn converts to. I just looked up the column count now and had no idea there were so many. Our software vendor hasn’t realized the value of dynamic sql statements for this process, so every software "upgrade" just keeps adding more columns rather than only selecting the ones needed. -
I cannot alter the SQL statement that generates the data
-
Depending on the data type, the data needs to be formatted in a specific format
-
Data does contains special characters, such as commas
-
The slow part is reading the data. Getting the data from the SQL server and writing it to a CSV is fast.
Here’s the code. Forgive the mess, I wrote it back when I didn’t know what I was doing and when I still was working in VB
Function ReadDataTableForCSV(dt as DataTable)
Dim sb = New StringBuilder()
Dim dataTypes As New Dictionary(Of String, Integer)
' Header Row
For i as Integer = 0 to dt.Columns.Count - 1
Dim col as DataColumn = dt.Columns(i)
Dim t = col.DataType
If t is GetType(Boolean) Then
dataTypes.Add(i, 1)
Else If t is GetType(Decimal) Then
dataTypes.Add(i, 2)
Else
dataTypes.Add(i, 3)
End If
sb.Append(String.Format("""{0}""", col.ColumnName))
sb.Append(Iif(i = dt.Columns.Count - 1, vbLf, ","))
Next
' Items
For Each row as DataRow in dt.Rows
For i As Integer = 0 To dt.Columns.Count - 1
Select dataTypes(i)
Case 1
sb.Append(String.Format("""{0}""", CInt(row(i))))
Case 2
sb.Append(String.Format("""{0}""", FormatNumber(row(i), 2, , , 0)))
Case 3
sb.Append(String.Format("""{0}""", row(i)))
End Select
sb.Append(Iif(i = dt.Columns.Count - 1, vbLf, ","))
Next
Next
End Function
Edit: Removed code not related to the problem
Here is how I would rewrite it:
Allocate the stringbuilder memory up front.
Change the data types from a dictionary to a byte array and only use values 1 and 2; value 3 will now be 0, which will be the default for items in the array.
Use the Ordinal property from the column rather than a separate index.
Streamline the evaluations inside the loop for item and line separators.
Use Decimal.ToString instead of FormatNumber.
Remove iifs (these are probably optimized by the compiler, but I am still leery of them from the early VB days)
Here’s the code: