Блог пользователя hello_daddy

Автор hello_daddy, 2 года назад, По-английски

We are given a stream of integers. We have to find median of the last K elements of the stream.

In other words

Constraints:
$$$0$$$ $$$\leq$$$ nums $$$\leq$$$ $$$10^6$$$, nums is the integer given in the stream.
K $$$\neq$$$ $$$0$$$ , atleast K elements have been given in the stream.
Refer to this comment for better understanding.

  • Проголосовать: нравится
  • +27
  • Проголосовать: не нравится

»
2 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

I will describe a solution that works in $$$O(\log^2)$$$ for add query and $$$O(\log^3)$$$ for median query. I believe that this solution can be optimized to $$$O(\log^2)$$$ for both queries, but I doubt that this problem is solvable in significantly faster (like, in pure logarithmic time), even though the naive bound of $$$\Omega(\log)$$$ is the best lower bound I provide.

So, the solution. Build a segment tree over your array with BST's storing all elements in a segment in each vertex. The addition operation simply adds a number into $$$O(\log)$$$ BST's, thus, it works in $$$O(\log^2)$$$ time. The median query is now essentially a problem of finding a median in a union of $$$O(\log)$$$ BST's. The simple way to do it is to perform a binary search over the answer. Binary search gives one logarithm, lower_bound on $$$O(\log)$$$ BST's gives two more. In total, this operation works in $$$O(\log^3)$$$ time.

I think that the binary search part can be done in $$$O(\log^2)$$$ time, however, I am not sure how.

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    UPD. The classical solution of the problem of finding nth_element on a segment works there as well. And if you use the solution with persistent segment trees, it would work in $$$O(\log)$$$ time on each operation.

    • »
      »
      »
      2 года назад, # ^ |
      Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

      Does this work online?

      EDIT: It seems I misread your solution initially, please disregard my message.

»
2 года назад, # |
Rev. 2   Проголосовать: нравится +5 Проголосовать: не нравится

You can solve this problem in O(log(n)) time per operation without having to implement your own data structures. Use an order statistic tree T of pairs of integers and a queue Q of integers. If LeetCode doesn't use libstdc++, you will need to implement a treap to replace the order statistic tree.

Maintain a counter i which keeps track of the index of the current number we are adding. To add the number x, add the pair (x, i) to T and add x to Q. If i >= k, we must remove y = Q.front() from both Q and T, since we only care about the most recent k numbers. To do this, get an iterator to y in T using T.lower_bound(make_pair(y, 0)), erase it from T, and pop y from Q.

To find the median, just use the given formula directly. In particular, if k is odd, return T.find_by_order(k / 2), and if k is even, return 0.5 * (T.find_by_order(k / 2 - 1) + T.find_by_order(k / 2)).

Order statistic trees are discussed at https://www.geeksforgeeks.org/ordered-set-gnu-c-pbds/. If order statistic trees aren't supported by the standard library implementation used by LeetCode and you need to implement a treap instead, see https://www.geeksforgeeks.org/treap-a-randomized-binary-search-tree/.

»
2 года назад, # |
  Проголосовать: нравится +18 Проголосовать: не нравится

What's wrong with this approach — Keep two sets (smaller half and larger half), so that the median can be computed in $$$O(1)$$$ time. Also keep a map to indicate which set each number lies in. After every input from stream, use the map to remove the $$$(k+1)^{th}$$$ last number, and then insert the new number and move 0 or 1 elements around (also reflecting changes in the map) so that both sets are either equal in size or off by 1.

  • »
    »
    2 года назад, # ^ |
    Rev. 2   Проголосовать: нравится +3 Проголосовать: не нравится

    I may have misunderstood your approach but we don't get $$$K$$$ while getting the input from stream. $$$K$$$ changes for every findMedian() query. For example, if following operations are being performed:
    - addNum(1) [1]
    - addNum(4) [1,4]
    - addNum(2) [1,4,2]
    - addNum(3) [1,4,2,3]
    - findMedian(4) // returns median of last 4 elements i.e. $$$(2+3)/2$$$
    - findMedian(3) // returns median of last 3 elements i.e. $$$3$$$
    - findMedian(2) // returns median of last 2 elements i.e. $$$(2+3)/2$$$
    - findMedian(1) // returns median of last 1 element i.e. $$$3$$$