Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7050449
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T03:08:42+00:00 2026-05-28T03:08:42+00:00

Background I’d like to estimate the big-oh performance of some methods in a library

  • 0

Background

I’d like to estimate the big-oh performance of some methods in a library through benchmarks. I don’t need precision — it suffices to show that something is O(1), O(logn), O(n), O(nlogn), O(n^2) or worse than that. Since big-oh means upper-bound, estimating O(logn) for something that is O(log logn) is not a problem.

Right now, I’m thinking of finding the constant multiplier k that best fits data for each big-oh (but will top all results), and then choosing the big-oh with the best fit.

Questions

  1. Are there better ways of doing it than what I’m thiking of? If so, what are they?
  2. Otherwise, can anyone point me to the algorithms to estimate k for best fitting, and comparing how well each curve fits the data?

Notes & Constraints

Given the comments so far, I need to make a few things clear:

  • This needs to be automated. I can’t “look” at data and make a judgment call.
  • I’m going to benchmark the methods with multiple n sizes. For each size n, I’m going to use a proven benchmark framework that provides reliable statistical results.
  • I actually know beforehand the big-oh of most of the methods that will be tested. My main intention is to provide performance regression testing for them.
  • The code will be written in Scala, and any free Java library can be used.

Example

Here’s one example of the kind of stuff I want to measure. I have a method with this signature:

def apply(n: Int): A

Given an n, it will return the nth element of a sequence. This method can have O(1), O(logn) or O(n) given the existing implementations, and small changes can get it to use a suboptimal implementation by mistake. Or, more easily, could get some other method that depends on it to use a suboptimal version of it.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T03:08:42+00:00Added an answer on May 28, 2026 at 3:08 am

    In order to get started, you have to make a couple of assumptions.

    1. n is large compared to any constant terms.
    2. You can effectively randomize your input data
    3. You can sample with sufficient density to get a good handle on the distribution of runtimes

    In particular, (3) is difficult to achieve in concert with (1). So you may get something with an exponential worst case, but never run into that worst case, and thus think your algorithm is much better than it is on average.

    With that said, all you need is any standard curve fitting library. Apache Commons Math has a fully adequate one. You then either create a function with all the common terms that you want to test (e.g. constant, log n, n, n log n, nn, nn*n, e^n), or you take the log of your data and fit the exponent, and then if you get an exponent not close to an integer, see if throwing in a log n gives a better fit.

    (In more detail, if you fit C*x^a for C and a, or more easily log C + a log x, you can get the exponent a; in the all-common-terms-at-once scheme, you’ll get weights for each term, so if you have n*n + C*n*log(n) where C is large, you’ll pick up that term also.)

    You’ll want to vary the size by enough so that you can tell the different cases apart (might be hard with log terms, if you care about those), and safely more different sizes than you have parameters (probably 3x excess would start being okay, as long as you do at least a dozen or so runs total).


    Edit: Here is Scala code that does all this for you. Rather than explain each little piece, I’ll leave it to you to investigate; it implements the scheme above using the C*x^a fit, and returns ((a,C),(lower bound for a, upper bound for a)). The bounds are quite conservative, as you can see from running the thing a few times. The units of C are seconds (a is unitless), but don’t trust that too much as there is some looping overhead (and also some noise).

    class TimeLord[A: ClassManifest,B: ClassManifest](setup: Int => A, static: Boolean = true)(run: A => B) {
      @annotation.tailrec final def exceed(time: Double, size: Int, step: Int => Int = _*2, first: Int = 1): (Int,Double) = {
        var i = 0
        val elapsed = 1e-9 * {
          if (static) {
            val a = setup(size)
            var b: B = null.asInstanceOf[B]
            val t0 = System.nanoTime
            var i = 0
            while (i < first) {
              b = run(a)
              i += 1
            }
            System.nanoTime - t0
          }
          else {
            val starts = if (static) { val a = setup(size); Array.fill(first)(a) } else Array.fill(first)(setup(size))
            val answers = new Array[B](first)
            val t0 = System.nanoTime
            var i = 0
            while (i < first) {
              answers(i) = run(starts(i))
              i += 1
            }
            System.nanoTime - t0
          }
        }
        if (time > elapsed) {
          val second = step(first)
          if (second <= first) throw new IllegalArgumentException("Iteration size increase failed: %d to %d".format(first,second))
          else exceed(time, size, step, second)
        }
        else (first, elapsed)
      }
    
      def multibench(smallest: Int, largest: Int, time: Double, n: Int, m: Int = 1) = {
        if (m < 1 || n < 1 || largest < smallest || (n>1 && largest==smallest)) throw new IllegalArgumentException("Poor choice of sizes")
        val frac = (largest.toDouble)/smallest
        (0 until n).map(x => (smallest*math.pow(frac,x/((n-1).toDouble))).toInt).map{ i => 
          val (k,dt) = exceed(time,i)
          if (m==1) i -> Array(dt/k) else {
            i -> ( (dt/k) +: (1 until m).map(_ => exceed(time,i,first=k)).map{ case (j,dt2) => dt2/j }.toArray )
          }
        }.foldLeft(Vector[(Int,Array[Double])]()){ (acc,x) =>
          if (acc.length==0 || acc.last._1 != x._1) acc :+ x
          else acc.dropRight(1) :+ (x._1, acc.last._2 ++ x._2)
        }
      }
    
      def alpha(data: Seq[(Int,Array[Double])]) = {
        // Use Theil-Sen estimator for calculation of straight-line fit for exponent
        // Assume timing relationship is t(n) = A*n^alpha
        val dat = data.map{ case (i,ad) => math.log(i) -> ad.map(x => math.log(i) -> math.log(x)) }
        val slopes = (for {
          i <- dat.indices
          j <- ((i+1) until dat.length)
          (pi,px) <- dat(i)._2
          (qi,qx) <- dat(j)._2
        } yield (qx - px)/(qi - pi)).sorted
        val mbest = slopes(slopes.length/2)
        val mp05 = slopes(slopes.length/20)
        val mp95 = slopes(slopes.length-(1+slopes.length/20))
        val intercepts = dat.flatMap{ case (i,a) => a.map{ case (li,lx) => lx - li*mbest } }.sorted
        val bbest = intercepts(intercepts.length/2)
        ((mbest,math.exp(bbest)),(mp05,mp95))
      }
    }
    

    Note that the multibench method is expected to take about sqrt(2)nm*time to run, assuming that static initialization data is used and is relatively cheap compared to whatever you’re running. Here are some examples with parameters chosen to take ~15s to run:

    val tl1 = new TimeLord(x => List.range(0,x))(_.sum)  // Should be linear
    // Try list sizes 100 to 10000, with each run taking at least 0.1s;
    // use 10 different sizes and 10 repeats of each size
    scala> tl1.alpha( tl1.multibench(100,10000,0.1,10,10) )
    res0: ((Double, Double), (Double, Double)) = ((1.0075537890632216,7.061397125245351E-9),(0.8763463348353099,1.102663784225697))
    
    val longList = List.range(0,100000)
    val tl2 = new TimeLord(x=>x)(longList.apply)    // Again, should be linear
    scala> tl2.alpha( tl2.multibench(100,10000,0.1,10,10) )
    res1: ((Double, Double), (Double, Double)) = ((1.4534378213477026,1.1325696181862922E-10),(0.969955396265306,1.8294175293676322))
    
    // 1.45?!  That's not linear.  Maybe the short ones are cached?
    scala> tl2.alpha( tl2.multibench(9000,90000,0.1,100,1) )
    res2: ((Double, Double), (Double, Double)) = ((0.9973235607566956,1.9214696731124573E-9),(0.9486294398193154,1.0365312207345019))
    
    // Let's try some sorting
    val tl3 = new TimeLord(x=>Vector.fill(x)(util.Random.nextInt))(_.sorted)
    scala> tl3.alpha( tl3.multibench(100,10000,0.1,10,10) )
    res3: ((Double, Double), (Double, Double)) = ((1.1713142886974603,3.882658025586512E-8),(1.0521099621639414,1.3392622111121666))
    // Note the log(n) term comes out as a fractional power
    // (which will decrease as the sizes increase)
    
    // Maybe sort some arrays?
    // This may take longer to run because we have to recreate the (mutable) array each time
    val tl4 = new TimeLord(x=>Array.fill(x)(util.Random.nextInt), false)(java.util.Arrays.sort)
    scala> tl4.alpha( tl4.multibench(100,10000,0.1,10,10) )
    res4: ((Double, Double), (Double, Double)) = ((1.1216172965292541,2.2206198821180513E-8),(1.0929414090177318,1.1543697719880128))
    
    // Let's time something slow
    def kube(n: Int) = (for (i <- 1 to n; j <- 1 to n; k <- 1 to n) yield 1).sum
    val tl5 = new TimeLord(x=>x)(kube)
    scala> tl5.alpha( tl5.multibench(10,100,0.1,10,10) )
    res5: ((Double, Double), (Double, Double)) = ((2.8456382116915484,1.0433534274508799E-7),(2.6416659356198617,2.999094292838751))
    // Okay, we're a little short of 3; there's constant overhead on the small sizes
    

    Anyway, for the stated use case–where you are checking to make sure the order doesn’t change–this is probably adequate, since you can play with the values a bit when setting up the test to make sure they give something sensible. One could also create heuristics that search for stability, but that’s probably overkill.

    (Incidentally, there is no explicit warmup step here; the robust fitting of the Theil-Sen estimator should make it unnecessary for sensibly large benchmarks. This also is why I don’t use any other benching framework; any statistics that it does just loses power from this test.)


    Edit again: if you replace the alpha method with the following:

      // We'll need this math
      @inline private[this] def sq(x: Double) = x*x
      final private[this] val inv_log_of_2 = 1/math.log(2)
      @inline private[this] def log2(x: Double) = math.log(x)*inv_log_of_2
      import math.{log,exp,pow}
    
      // All the info you need to calculate a y value, e.g. y = x*m+b
      case class Yp(x: Double, m: Double, b: Double) {}
    
      // Estimators for data order
      //   fx = transformation to apply to x-data before linear fitting
      //   fy = transformation to apply to y-data before linear fitting
      //   model = given x, slope, and intercept, calculate predicted y
      case class Estimator(fx: Double => Double, invfx: Double=> Double, fy: (Double,Double) => Double, model: Yp => Double) {}
      // C*n^alpha
      val alpha = Estimator(log, exp, (x,y) => log(y), p => p.b*pow(p.x,p.m))
      // C*log(n)*n^alpha
      val logalpha = Estimator(log, exp, (x,y) =>log(y/log2(x)), p => p.b*log2(p.x)*pow(p.x,p.m))
    
      // Use Theil-Sen estimator for calculation of straight-line fit
      case class Fit(slope: Double, const: Double, bounds: (Double,Double), fracrms: Double) {}
      def theilsen(data: Seq[(Int,Array[Double])], est: Estimator = alpha) = {
        // Use Theil-Sen estimator for calculation of straight-line fit for exponent
        // Assume timing relationship is t(n) = A*n^alpha
        val dat = data.map{ case (i,ad) => ad.map(x => est.fx(i) -> est.fy(i,x)) }
        val slopes = (for {
          i <- dat.indices
          j <- ((i+1) until dat.length)
          (pi,px) <- dat(i)
          (qi,qx) <- dat(j)
        } yield (qx - px)/(qi - pi)).sorted
        val mbest = slopes(slopes.length/2)
        val mp05 = slopes(slopes.length/20)
        val mp95 = slopes(slopes.length-(1+slopes.length/20))
        val intercepts = dat.flatMap{ _.map{ case (li,lx) => lx - li*mbest } }.sorted
        val bbest = est.invfx(intercepts(intercepts.length/2))
        val fracrms = math.sqrt(data.map{ case (x,ys) => ys.map(y => sq(1 - y/est.model(Yp(x,mbest,bbest)))).sum }.sum / data.map(_._2.length).sum)
        Fit(mbest, bbest, (mp05,mp95), fracrms)
      }
    

    then you can get an estimate of the exponent when there’s a log term also–error estimates exist to pick whether the log term or not is the correct way to go, but it’s up to you to make the call (i.e. I’m assuming you’ll be supervising this initially and reading the numbers that come off):

    val tl3 = new TimeLord(x=>Vector.fill(x)(util.Random.nextInt))(_.sorted)
    val timings = tl3.multibench(100,10000,0.1,10,10)
    
    // Regular n^alpha fit
    scala> tl3.theilsen( timings )
    res20: tl3.Fit = Fit(1.1811648421030059,3.353753446942075E-8,(1.1100382697696545,1.3204652930525234),0.05927994882343982)
    
    // log(n)*n^alpha fit--note first value is closer to an integer
    //   and last value (error) is smaller
    scala> tl3.theilsen( timings, tl3.logalpha )
    res21: tl3.Fit = Fit(1.0369167329732445,9.211366397621766E-9,(0.9722967182484441,1.129869067913768),0.04026308919615681)
    

    (Edit: fixed the RMS computation so it’s actually the mean, plus demonstrated that you only need to do timings once and can then try both fits.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Background: What I need to accomplish is to remove any records in a collection
Background : I'm trying to convert some JavaScript code which uses the the Crossfilter
Background: I would like to dismiss a modalView that I have presented earlier and
Background: Using unix, codeigniter from localhost. I'd like to run a controllers via a
Background: I have some existing apps in the App Store and I have just
Background: Some time ago, I built a system for recording and categorizing application crashes
Background: I need to reserve an amount of memory below 0xA0000 prior to my
Background: I'm working through the Programming Collective Intelligence book by Toby Segaran; specifically the
Background: We're building an application that allows our customers to supply data in a
Background: I'm using the (fantastic) Vim plugin python-mode , which includes the pep8 linter.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.