How can I create a function “bool IsDateTime” that will reliably determine whether an Excel number format string like “[$-409]h:mm:ss AM/PM;@” indicates that the numeric value is a DateTime that should be passed to DateTime.FromOADate?
I’ve figured out what the [$-409] is: Excel Number Format: What is "[$-409]"?. It’s just a locale code.
I’ve also read a little about the number format string being separated into four format sections by semicolons: http://office.microsoft.com/en-us/excel-help/create-or-delete-a-custom-number-format-HP005199500.aspx?CTT=5&origin=HP005198679 and here http://www.ozgrid.com/Excel/excel-custom-number-formats.htm
For example, would it be reliable to simply search for occurrences of the date/time format characters like h,m,s,y,d? How might Excel interpret it?
In case the question is not clear… when you read an Excel file and look at a date/time value, you’re actually looking at a plain old double-precision value, because that’s how it’s stored in Excel. To figure out whether it’s an ordinary double or a double that should be passed to DateTime.FromOADate, you must interpret the custom number format string. So I am asking how to go about interpreting such a string, which may or may not refer to a date/time value, in order to determine whether the double-precision value should be converted to a DateTime value via DateTime.FromOADate. Furthermore, if successfully converted to a DateTime value, I would then need to convert the Excel number format string into an equivalent .NET DateTime format string so I could display the date/time value as Excel would via DateTime.ToString( converted_format_string ).
I implemented a class to parse the Excel number format string. It looks at the first section (of four possible sections in the format string), and uses a Regex to capture date/time specific custom format characters such as “y”, “m”, “d”, “h”, “s”, “AM/PM”, and returns null if none are found. This first step simply decides whether the format string is meant for a date/time value, and leaves us with an object-oriented ordered list of logical date/time format specifiers for further processing.
Assuming it was decided that the format string is meant for a date/time value, the captured and classified values are sorted into the order they were found in the original format string.
Next, it applies Excel-specific formatting quirks, like deciding whether “m” means month or minute, interpreting it as “minute” only if it appears immediately after an “h” or before an “s” (literal text is allowed between them, so it’s not exactly “immediately” before/after). Excel also forces 24-hour time for the “h” character if “AM/PM” is not also specified, so if “AM/PM” is not found, it uses the lowercase m (24-hour time in .NET), otherwise it converts it to a capital M (12-hour time in .NET). It also converts “AM/PM” to the .NET equivalent “tt”, and blanks out conditional expressions, which cannot be included in a plain .NET DateTime format string.
The above class can be used in the following context to read string values into a DataTable from the columns in an Excel file that have non-null headers. Specifically, it attempts to acquire a valid DateTime instance, and if one is found, it attempts to construct a valid .NET DateTime format string from the Excel number format string. If both of the previous steps are successfuly, it stores the formatted date time string in the data table, and otherwise it converts whatever value is present to a string (ensuring to strip out rich text formatting first if present):