Thoughts:
I imagine that in .NET, the underlying math with dates is done with Ticks. If that were the case, I would think it wouldn’t matter how far away two dates are when determining the difference between them. You would simply subtract the Ticks and then do a series of divisions to convert the result from Ticks to days. I don’t see how two closer dates would make that faster, or how further dates would slow it down. Am I missing something?
On the SQL side……I have no idea. I imagine it is similar, but I have no proof of it.
Example/Context:
Let’s say I have a function that if given a start date, end date, and time period (in days for this example), it will tell me how many times that period can occur in the given date range.
somefunction(<first of this year>, <first of last year>, <30 days>)
//returns 12
One (bad) way to implement this function is to start at the start date, then keep adding the time period (e.g. 30 days) and check to see if you have passed your end date. However, this gets slower the wider your date range is.
Another way is to figure out how many days are in the date range and divide by the number of days in your time period. In .NET, you can subtract the start and end dates and get a TimeSpan back. In SQL, you can use the DateDiff function to do just about the same thing.
My question is if these other methods suffer from the same problem as the first. Specifically: Would it be faster to calculate the difference between two dates that are close or does it make no difference at all?
Edit: Why did I ask this?
Was the performance for finding the difference between two dates really a problem I ran into?
Yes (with an asterisk). In one of our apps, a calculation was being made that took .3 seconds (and usually had to be made 30 times or so). The users were less than thrilled, so I tried to see where we could speed things up. I traced the problem to a function whose purpose was to find the difference between two dates. Rather than just subtracting them, it iterated over all the dates between start and end, and kept a running total…really. While switching the function over to just using subtraction (and date diff in SQL (there was similar code in the database)) I saw that there were processes that ran every night to generate a number closer to today for the calculation to use. I asked this question to see if there was any value in continuing to let those processes run, and use the value they generate, or just use the original start date. I now feel very comfortable putting those processes to rest. Thank you all for your answers.
Answering the general question
For non-uniform time periods such as months, there may be a certain amount of guesswork involved. In Noda Time we do some calculations by getting to “a reasonable guess” by (say) dividing a duration in ticks by “the average number of ticks per month”, then using the rest of the code to try that guess and see whether it was correct or not. If it wasn’t, we adjust the guess and try again.
Now it’s possible that those guesses will become gradually less accurate over greater time spans – because the “average number of ticks per month” may not be exact. However, I suspect it would have to be over a very large time period to make a significant difference. It’s more likely that the guess will be out by one or two due to boundary conditions on the months (e.g. being just the wrong side of a long month) – and that can happen anywhere.
Also note that some calendar systems are more amenable to optimizations than others – and some of these may well be affected by the dates in question. For example, if you have a split Julian/Gregorian calendar with a cutover point, I can easily imagine it taking longer to work out periods between two dates which straddle the cutover than periods which lie entirely one side or the other.
Basically, calendaring systems are complicated – it’s best not to assume anything about “it should simply be a matter of XYZ…” as it’s almost bound to be wrong 🙂
Answering the specific question
Yes, your second approach sounds like it should indeed be much, much faster than the first for long time periods – and any difference in calculation speed for long and short time periods is unlikely to cause that much difference even if it exists; I doubt you’ll be able to see it, although it’s still worth testing of course.