Please, try EDU on Codeforces! New educational section with videos, subtitles, texts, and problems. ×

### r3novatio's blog

By r3novatio, history, 4 weeks ago, , I read somewhere that the C++ STL parital_sort function (used to find k smallest numbers) has a running time complexity of O(n.logk).

It achieves this by going through the entire data, maintaining a k-sized max-heap, throwing away the top whenever size exceeds k, and in the end printing the k remaining elements from the heap.

Can't an n-sized min-heap be used instead and then the first k smallest numbers simply extracted from it. I understand it won't remain in-place and would require additional memory.

But time complexity wise, it would take O(n+k.logn) which is better than O(n.logk), right? (assuming k to be any number smaller than n)

Then why is the O(n.logk) version preferred? Why is it mentioned everywhere and used by the std template? #stl, Comments (20)
 » That would require using a heap which has $O(1)$ insertion, and those usually have a really bad constant.
•  » » How is the O(n.logk) achieved then? If not using heaps/PQs.
•  » » » For the $O(n$ $log$ $k)$ algorithm you can have a heap with both $O(log$ $size)$ insertion and deletion of first element, for which an ordinary binary heap works which is pretty fast in practice. For the $O(n + k$ $log$ $k)$ algorithm you would need to have a heap with $O(1)$ insertion and $O(log$ $size)$ deletion of first element, which usually have a much higher constant factor.
•  » » » » You do not need to insert all elements one by one. That would take O(n.logn) just to make the heap.If I remember my Data Structures class correctly, one can simply use heapify to build a heap of n numbers in O(n) time. Doesn't necessarily need O(1) insertion for that.
•  » » » » » Yeah, that's actually true. In that case I don't know the answer.
•  » » a heap could be made in a complexity of O(n) by using the function makeheap(). It is a practical algorithm .It works in the way of making heap from the bottom to the top. In this problem，we don't need to insert the element N times.
•  » » » Yes, that was already mentioned.
 » It can be implemented using nth_element and sort in-place and $O(n + klogk)$ time. So every bigger time seems strange to me.
•  » » nth-element guarantees only average complexity iirc
•  » » » Doesn't it also use median of medians, which guarantees complexity?
•  » » » We can use median of medians as pivot-selecting method and it will ensures Linear complexity, while afaik nth_element uses quickselect which can lead us to bad partitions and only ensures linear complexity on an average.
•  » » » » Median of medians/ QuickSelect takes O(n) time to find i-th smallest number. So to find all 1 <= i <= k it'll take O(kn) time right?
•  » » » » » 4 weeks ago, # ^ | ← Rev. 3 →   Quick select has worst case O(N^2). So instead of using Quick select, we can use median of medians algorithm to partition the array at position k(like nth_element does) as it ensures linear complexity and then sort the k elements which will be O(n + klogk) as okwedook suggested above.
•  » » » » » » Oh so we just need to partition once. I was thinking maybe for every i<=k we need to run the algorithm again. Yep makes sense! Thanks
•  » » » » » » In nth_element pure quickselect isn't used; it tries quickselect for the first $O(log$ $n)$ layers and then it ends by using a heap on the remaining part of the array, so on average the complexity is $O(n)$ while in the worst case it's $O(n$ $log$ $n)$.
•  » » » » » » » 4 weeks ago, # ^ | ← Rev. 2 →   Any sources for this? The first answer to this question says that it is nowhere mentioned that which algorithms it uses if quick select fails due to bad partitions.
•  » » » » » » » »
•  » » » » » » » » » Ok, thanks for pointing out i will update the statement above.
 » An interesting question.There is a presentation from CppCon 2018 by Fred Tingaud which addresses the matter.Video, presumably: https://www.youtube.com/watch?v=-0tO3Eni2uo.Here's a short summary from slide 87: The typical usage of std::partial_sort is to sort a small subset of elements in a big container. The STL implementers chose a faster $O(N \cdot \log(k))$ algorithm that performs well for this typical use-case at the expense of other scenarios.
•  » » Wow! I'll take a look. Thank you :)