Background
I’d like to estimate the big-oh performance of some methods in a library through benchmarks. I don’t need precision — it suffices to show that something is O(1), O(logn), O(n), O(nlogn), O(n^2) or worse than that. Since big-oh means upper-bound, estimating O(logn) for something that is O(log logn) is not a problem.
Right now, I’m thinking of finding the constant multiplier k that best fits data for each big-oh (but will top all results), and then choosing the big-oh with the best fit.
Questions
- Are there better ways of doing it than what I’m thiking of? If so, what are they?
- Otherwise, can anyone point me to the algorithms to estimate k for best fitting, and comparing how well each curve fits the data?
Notes & Constraints
Given the comments so far, I need to make a few things clear:
- This needs to be automated. I can’t “look” at data and make a judgment call.
- I’m going to benchmark the methods with multiple
nsizes. For each sizen, I’m going to use a proven benchmark framework that provides reliable statistical results. - I actually know beforehand the big-oh of most of the methods that will be tested. My main intention is to provide performance regression testing for them.
- The code will be written in Scala, and any free Java library can be used.
Example
Here’s one example of the kind of stuff I want to measure. I have a method with this signature:
def apply(n: Int): A
Given an n, it will return the nth element of a sequence. This method can have O(1), O(logn) or O(n) given the existing implementations, and small changes can get it to use a suboptimal implementation by mistake. Or, more easily, could get some other method that depends on it to use a suboptimal version of it.
In order to get started, you have to make a couple of assumptions.
nis large compared to any constant terms.In particular, (3) is difficult to achieve in concert with (1). So you may get something with an exponential worst case, but never run into that worst case, and thus think your algorithm is much better than it is on average.
With that said, all you need is any standard curve fitting library. Apache Commons Math has a fully adequate one. You then either create a function with all the common terms that you want to test (e.g. constant, log n, n, n log n, nn, nn*n, e^n), or you take the log of your data and fit the exponent, and then if you get an exponent not close to an integer, see if throwing in a log n gives a better fit.
(In more detail, if you fit
C*x^aforCanda, or more easilylog C + a log x, you can get the exponenta; in the all-common-terms-at-once scheme, you’ll get weights for each term, so if you haven*n + C*n*log(n)whereCis large, you’ll pick up that term also.)You’ll want to vary the size by enough so that you can tell the different cases apart (might be hard with log terms, if you care about those), and safely more different sizes than you have parameters (probably 3x excess would start being okay, as long as you do at least a dozen or so runs total).
Edit: Here is Scala code that does all this for you. Rather than explain each little piece, I’ll leave it to you to investigate; it implements the scheme above using the C*x^a fit, and returns ((a,C),(lower bound for a, upper bound for a)). The bounds are quite conservative, as you can see from running the thing a few times. The units of
Care seconds (ais unitless), but don’t trust that too much as there is some looping overhead (and also some noise).Note that the
multibenchmethod is expected to take about sqrt(2)nm*time to run, assuming that static initialization data is used and is relatively cheap compared to whatever you’re running. Here are some examples with parameters chosen to take ~15s to run:Anyway, for the stated use case–where you are checking to make sure the order doesn’t change–this is probably adequate, since you can play with the values a bit when setting up the test to make sure they give something sensible. One could also create heuristics that search for stability, but that’s probably overkill.
(Incidentally, there is no explicit warmup step here; the robust fitting of the Theil-Sen estimator should make it unnecessary for sensibly large benchmarks. This also is why I don’t use any other benching framework; any statistics that it does just loses power from this test.)
Edit again: if you replace the
alphamethod with the following:then you can get an estimate of the exponent when there’s a log term also–error estimates exist to pick whether the log term or not is the correct way to go, but it’s up to you to make the call (i.e. I’m assuming you’ll be supervising this initially and reading the numbers that come off):
(Edit: fixed the RMS computation so it’s actually the mean, plus demonstrated that you only need to do timings once and can then try both fits.)