Someone wants me to implement a server side data extraction service to extract data from Micorsoft Excel 2010 spreadsheet (xlsx). A spreadsheet must have data in the correct places in order for the extraction to work. Is there a better alternative to using spreadsheets as data collection ? I worry that users might produce a spreadsheet that can fail a parsing/extraction method even though the displayed spreadsheet is understandable to a human.
For example , a user needs to type out many items and each item will several detail lines following it. My program will need identify the boundary between each item and then collect the detail lines that follow it. If a extraction fails, a user will need clues to help them to fix the problem and then re-submit the xlsx file again.
Is there a better way ? Is there something as portable as a Excel spreadsheet but has structured data that can be easily extracted ?
Or perhaps can a Excel spreadsheet to prepare data into structured data such as a JSON representation and then store it as part of the open xml package ?
You can improve data collection using Excel by using Named Ranges and adding Validation code that runs on data entry to the spreadsheet. The Validation code could also add metadata tags to the workbook. Then your extraction program can use the Named ranges (and metadata) to find the data.