I have to manipulate some Excel documents with C#. It’s a batch process with no user interaction. It’s going to parse data into a database, then output nice reports. The data is very dirty and cannot be ready using ADO. The data is nowhere near a nice table format.
Best is defined as the most stable(updates less likely to break)/ clear(succinct) code. Fast doesn’t matter. If it runs in less than 8 hours I’m fine.
I have the logic to find the data worked out. All I need to make it run is basic cell navigation and getvalue type functions. Give me X cell value as string, if it matches Y value with levenshtein distance < 3, then give me Z cell value.
My question is, what is the best way to dig into the excel?
VSTO?
Excel Objects Library?
Third Option I’m not aware of?
VSTO is kind of a pain because of permissions and the fact that your dll becomes hooked to the document you’re using. Assuming you’re not actually changing the files, and ADO is definitely not an option, I would say that automation through the Excel COM interfaces is your best bet. It lets you program the way you normally would for any other application, and gives you just as many options for data extraction as VSTO.