My overall problem is that I have a large Excel file(Column A-S, 85000 rows) that I want to convert to XML. The data in the cells is all text.
The process I’m using now is to manually save the excel file as csv, then parse that in my own c# program to turn it into XML. If you have better recommendations, please recommend. I’ve searched SO and the only fast methods I found for converting straight to XML require my data to be all numeric.
(Tried reading cell by cell, would have taken 3 days to process)
So, unless you can recommend a different way for me to approach the problem, I want to be able to programmatically remove all commas, <, >, ‘, and ” from the excel sheet.
I would use a combination of
Microsoft.Office.Interop.ExcelandXmlSerializerto get the job done.This is in light of the fact that a) you’re using a console appilcation, and b) the interop assemblies are easy to integrate to the solution (just References->Add).
I’m assuming that you have a copy of Excel installed in the machine runnning the process (you mentioned you manually open the workbook currently, hence the assumption).
The code would look something like this:
The serializable layer:
Then open the worksheet and load into a 2D array:
The generated XML output will look something like this: