Ive always been a bit confused on this, possibly due to my lack of understanding in compilers. But lets use python as an example. If we had some large list of numbers called numlist and wanted to get rid of any duplicates, we could use a set operator on the list, example set(numlist). In return we would have a set of our numbers. This operation to the best of my knowledge will be done in O(n) time. Though if I were to create my own algorithm to handle this operation, the absolute best I could ever hope for is O(n^2).
What I don’t get is, what allows a internal operation like set() to be so much faster then an external to the language algorithm. The checking still needs to be done, don’t they?
You can do this in O(n) in any language, basically as:
I’m assuming here that
appendis an O(1) operation, which it should be unless the implementer was brain-dead. So with k steps each O(n), you still have an O(n) operation.Whether the steps are explicitly done in your code or whether they’re done under the covers of a language is irrelevant. Otherwise you could claim that the C
qsortwas one operation and you now have the holy grail of an O(1) sort routine 🙂As many people have discovered, you can often trade off space complexity for time complexity. For example, the above only works because we’re allowed to introduce the
isInListandnewListvariables. If this were not allowed, the next best solution may be sorting the list (probably no better the O(n log n)) followed by an O(n) (I think) operation to remove the duplicates.An extreme example, you can use that same extra-space method to sort an arbitrary number of 32-bit integers (say with each only having 255 or less duplicates) in O(n) time, provided you can allocate about four billion bytes for storing the counts.
Simply initialise all the counts to zero and run through each position in your list, incrementing the count based on the number at that position. That’s O(n).
Then start at the beginning of the list and run through the count array, placing that many of the correct value in the list. That’s O(1), with the 1 being about four billion of course but still constant time 🙂
That’s also O(1) space complexity but a very big “1”. Typically trade-offs aren’t quite that severe.