Background:
Some time ago, I built a system for recording and categorizing application crashes for one of our internal programs. At the time, I used a combination of frequency and aggregated lost time (the time between the program launch and the crash) for prioritizing types of crashes. It worked reasonably well.
Now, The Powers That Be want solid numbers on the cost of each type of crash being worked on. Or at least, numbers that look solid. I suppose I could use the aggregate lost time, multiplied by some plausible figure, but it seems dodgy.
Question:
Are there any established methods of calculating the real-world cost of application crashes? Or failing that, published studies speculating on such costs?
Consensus
Accuracy is impossible, but an estimate based on uptime should suffice if it is applied consistently and its limitations clearly documented. Thanks, Matt, Orion, for taking time to answer this.
I’ve not seen any studies, but a reasonable heuristic would be something like :
( Time since last application save when crash occurred + Time to restart application ) * Average hourly rate of application operator.
The estimation gets more complex if the crashes have some impact on external customers such, or might delay other things (i.e. create a bottle neck such that another person winds up sitting around waiting because some else’s application crashed).
That said, your ‘powers that be’ may well be happy with a very rough estimate so long as it’s applied consistently and they can see how it is changing over time.