I’m asking this in regards to Top K algorithm. I’d think that O(n +

Question

0

Asked: May 23, 20262026-05-23T18:02:22+00:00 2026-05-23T18:02:22+00:00

I’m asking this in regards to Top K algorithm. I’d think that O(n +

0

I’m asking this in regards to Top K algorithm. I’d think that O(n + k log n) should be faster, because well.. for instance if you try plugging in k = 300 and n = 100000000 for example, we can see that O(n + k log n) is smaller.

However when I do a benchmark with C++, it’s showing me that O (n log k) is more than 2x faster. Here’s the complete benchmarking program:

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <ctime>
#include <cstdlib>
using namespace std;

int RandomNumber () { return rand(); }
vector<int> find_topk(int arr[], int k, int n)
{
   make_heap(arr, arr + n, greater<int>());

   vector<int> result(k);

   for (int i = 0; i < k; ++i)
   {
      result[i] = arr[0];
      pop_heap(arr, arr + n - i, greater<int>());
   }

   return result;
}

vector<int> find_topk2(int arr[], int k, int n)
{
   make_heap(arr, arr + k, less<int>());

   for (int i = k; i < n; ++i)
   {
      if (arr[i] < arr[0])
      {
     pop_heap(arr, arr + k, less<int>());
     arr[k - 1] = arr[i];
     push_heap(arr, arr + k, less<int>());
      }
   }

   vector<int> result(arr, arr + k);

   return result;
}


int main()
{
   const int n = 220000000;
   const int k = 300;

   srand (time(0));
   int* arr = new int[n];

   generate(arr, arr + n, RandomNumber);

   // replace with topk or topk2
   vector<int> result = find_topk2(arr, k, n);

   copy(result.begin(), result.end(), ostream_iterator<int>(cout, "\n"));


   return 0;
}

find_topk ‘s approach is to build a complete heap of size n, in O(n) and then remove the top element of heap k times O(log n).
find_topk2 ‘s approach is to build a heap of size k (O(k)) such that max element is at the top, and then from k to n, compare to see if any element is smaller than top element, and if so pop the top element, and push the new element which would mean n times O(log k).
Both approach are written quite similarly so I don’t believe any implementation detail (like creating temporaries etc.) can cause a difference besides the algo and the dataset (which is random).

I could actually profile the results of the benchmark and could see that find_topk actually called the comparison operator many more times than find_topk2. But I’m interested more in the reasoning of the theoretical complexity.. so two questions.

Disregarding the implementation or benchmark, was I wrong in expecting that O(n + k log n) should be better than O(n log k)? If I’m wrong, please explain why and how to reason such that I can see O(n log k) is actually better.
If I’m not wrong to expect no 1. Then why is my benchmark showing otherwise?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T18:02:22+00:00

Big O in several variables is complex, since you need assumptions on how your variables scale with one another, so you can take unambiguously the limit to infinity.

If eg. k ~ n^(1/2), then O(n log k) becomes O(n log n) and O(n + k log n) becomes O(n + n^(1/2) log n) = O(n), which is better.

If k ~ log n, then O(n log k) = O(n log log n) and O(n + k log n) = O(n), which is better. Note that log log 2^1024 = 10, so the constants hidden in the O(n) may be greater than log log n for any realistic n.

If k = constant, then O(n log k) = O(n) and O(n + k log n) = O(n), which is the same.

But the constants play a big role: for instance, building a heap may involve reading the array 3 times, whereas building a priority queue of length k as you go only requires one pass through the array, and a small constant times log k for the lookup.

Which is “better” is therefore unclear, although my quick analysis tended to show that O(n + k log n) performs better under mild assumptions on k.

For instance, if k is a very small constant (say k = 3), then I’m ready to bet that the make_heap approach performs worse than the priority queue one on real world data.

Use asymptotic analysis wisely, and above all, profile your code before drawing conclusions.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m asking this in regards to Top K algorithm. I’d think that O(n +

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply