What is the most efficient way to enumerate every cell in every sheet in a workbook?
The method below seems to work reasonably for a workbook with ~130,000 cells. On my machine it took ~26 seconds to open the file and ~5 seconds to enumerate the cells . However I’m no Excel expert and wanted to validate this code snippet with the wider community.
DateTime timer = DateTime.Now; Microsoft.Office.Interop.Excel.Application excelApplication = new Microsoft.Office.Interop.Excel.Application(); try { exampleFile = new FileInfo(Path.Combine(System.Environment.CurrentDirectory, 'Large.xlsx')); excelApplication.Workbooks.Open(exampleFile.FullName, false, false, missing, missing, missing, true, missing, missing, true, missing, missing, missing, missing, missing); Console.WriteLine(string.Format('Took {0} seconds to open file', (DateTime.Now - timer).Seconds.ToString())); timer = DateTime.Now; foreach(Workbook workbook in excelApplication.Workbooks) { foreach(Worksheet sheet in workbook.Sheets) { int i = 0, iRowMax, iColMax; string data = String.Empty; Object[,] rangeData = (System.Object[,]) sheet.UsedRange.Cells.get_Value(missing); if (rangeData != null) { iRowMax = rangeData.GetUpperBound(0); iColMax = rangeData.GetUpperBound(1); for (int iRow = 1; iRow < iRowMax; iRow++) { for(int iCol = 1; iCol < iColMax; iCol++) { data = rangeData[iRow, iCol] != null ? rangeData[iRow, iCol].ToString() : string.Empty; if (i % 100 == 0) { Console.WriteLine(String.Format('Processed {0} cells.', i)); } i++; } } } } workbook.Close(false, missing, missing); } Console.WriteLine(string.Format('Took {0} seconds to parse file', (DateTime.Now - timer).Seconds.ToString())); } finally { excelApplication.Workbooks.Close(); excelApplication.Quit(); }
Edit:
Worth stating that I want to use PIA and interop in order to access properties of excel workbooks that are not exposed by API’s that work directly with the Excel file.
Excel PIA Interop is really slow when you are doing things cell by cell.
You should select the range you want to extract, like you did with the
Worksheet.UsedRangeproperty and then read the value of the whole range in one step, by invokingget_Value()(or just simply by reading theValueorValue2property, I can’t remember which one) on it.This will yield an
object[,], that is, a two dimensional array, which can be easily enumerated and is quick to be read.EDIT: I just read your actual code and realized that it actually does what I proposed. Shame on me for not reading the question properly before answering. In that case, you cannot make it much faster. Excel PIA Interop is slow. If you need a quicker solution you will have to either migrate jExcelApi from Java to C# (not a terribly hard thing to do) or use some commercial component. I suggest to avoid the OLEDB interface at all costs, in order to keep your sanity.
Unrelated, but helpful tip: You should use the ?? operator. It is really handy. Instead of
you could just write
In that case, even String.Empty is not necessary since Convert.ToString(object) converts
nullto an empty string anyway.