When submitting a solution in C++, please select either C++14 (GCC 6-32) or C++17 (GCC 7-32) as your compiler. ×

xuanji's blog

By xuanji, history, 22 months ago, In English

Curious if anyone knows how to solve the following problem faster than $$$O(n^2)$$$.

Given an array $$$a$$$ of length $$$n$$$ and an integer $$$n$$$, find any two numbers $$$l$$$ and $$$r$$$ ($$$l \le r)$$$ such that:

  • Let $$$a'$$$ be the subarray $$$a_l, a_{l+1}, ... a_r$$$. Then for each $$$x$$$ that appears in $$$a'$$$, $$$x$$$ appears in $$$a'$$$ at least $$$k$$$ times (i.e. $$$k$$$ or more array elements are equal to $$$x$$$).
  • The value $$$r-l$$$ is maximized.

Source of problem: I misread https://codeforces.com/contest/1676/problem/F

  • Vote: I like it
  • +58
  • Vote: I do not like it

»
22 months ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

[deleted] sorry i misread the blog

»
22 months ago, # |
Rev. 2   Vote: I like it -25 Vote: I do not like it

Yeah bro, I solved this problem only and then looked at output format after writing entire code:

PS: By segment, I mean continuous elements.

Observation: If frequency of any element in the array < k, we surely cannot take this. So the array will be divided into segments. You can only take elements from one segment, and this segment is a repeating problem(not a subproblem).

Solution:

Solution

Since every element is added once and removed from the cur_freq array atmost once. So time complexity: O(N*logN).

»
22 months ago, # |
  Vote: I like it +62 Vote: I do not like it

If all array elements occur at least $$$k$$$ times, then the answer is $$$n-1$$$. Otherwise there is an element that occurs less than $$$k$$$ times and we will "split" the array into subarrays that don't contain that element and recursively solve those.

Store all indices of all array elements in a map<int,vector<int>> loc; that we can use binary search on to calculate the frequency of an element $$$x$$$ in an interval $$$[l,r]$$$. Let's write a function $$$count(x,l,r)$$$ that does this.

We will write a function $$$solve(l,r)$$$ that calculates the maximum answer on the subarray $$$a[l,r]$$$. We will use a 2-pointer trick with divide and conquer to achieve a good time complexity. Move the $$$l$$$ pointer to the right and the $$$r$$$ pointer to the left until we find an element that occurs less that $$$k$$$ times, and recurse if necessary. The code will look something like this

int solve(int l, int r) {
  if(l > r) return -1;
  int L = l, R = r;
  while(l <= r) {
    if(count(a[l], L, R) < k) {
      return max(solve(L, l-1), solve(l+1, R));
    }
    l++;
    if(count(a[r], L, R) < k) {
      return max(solve(L, r-1), solve(r+1, R));
    }
    r--;
  }
  return R-L; // all elements occur >= k times
}

if we recurse on index $$$i$$$, then the recurrence relation is

$$$T(l,r) = T(l,i-1) + T(i+1, r) + O(min(i-l,r-i)*log(n))$$$

The solution to this recurrence is $$$O(n*log^2(n))$$$, which is the overall time complexity.

»
22 months ago, # |
  Vote: I like it +41 Vote: I do not like it

Yes, this problem can be solved in $$$O(n \log n)$$$.

First, let's assume we have a function that tells us whether a given subarray $$$l \ldots r$$$ is suitable and if not, tells us the position of an offending element, i.e. one that appears in this subarray more than 0 but less than $$$k$$$ times. Later we will discuss how to implement this efficiently, i.e. in time $$$O(\log n)$$$.

To solve the problem:

  • Try the entire array.
  • If this does not work:
    • Find a position of a value that appears more than 0 but less than $$$k$$$ times.
    • Clearly this position can not be in the solution at all.
    • Cross it off, you're left with one or two subarrays.
    • Recurse on these subarrays, return the longest solution found.

The complexity is $$$O(n \log n)$$$: every element is crossed off at most once as the subarrays you recurse on are always distinct.

Now let's discuss how to check if a subarray is suitable. First, we calculate values $$$\mathrm{need}[i]$$$: suppose $$$a_i$$$ is the $$$j$$$-th occurrence of $$$a_i$$$ in the array. Then $$$a_{\mathrm{need[i]}}$$$ should be the $$$j-k+1$$$-th occurrence of $$$a_i$$$ in the array. If $$$j < k$$$, then set $$$\mathrm{need}[i] = -\infty$$$. In other words, $$$\mathrm{need}[i]$$$ is the rightmost possible left endpoint of the subarray if $$$a_i$$$ is included and it is the last of its value in the subarray. How to calculate $$$\mathrm{need}[i]$$$ is left as an exercise to the reader but it is straightforward.

Now we build a persistent segment tree that stores pairs, supports range minimums and point updates. Initialize it with everything set to $$$\langle \infty, \infty \rangle$$$.

The $$$i$$$-th revision of the segment tree is built as follows based on the $$$(i - 1)$$$-th revision:

  • Set the $$$i$$$-th element of the segment tree to $$$\langle \mathrm{need}[i], i \rangle$$$.
  • Let $$$j$$$ be the last occurrence of $$$a_i$$$ before $$$i$$$ (if there is none, skip this step). Set the $$$j$$$-th element of the segment tree to $$$\langle \infty, \infty \rangle$$$.

To check if subarray $$$l \ldots r$$$ is suitable, take range minimum on the segment tree on the range $$$l \ldots r$$$ and revision $$$r$$$. If the first element is $$$\ge l$$$, the subarray is suitable. Otherwise, the second element tells you a faulty position.

»
22 months ago, # |
  Vote: I like it +63 Vote: I do not like it

Why is it that every time I see a blog like this, it has existed for 9 hours, unanswered, last comment in 3 hours. Then when I start writing my response suddenly there are already 2 comments explaining the solution.

  • »
    »
    22 months ago, # ^ |
      Vote: I like it +4 Vote: I do not like it

    Haha, I thought the same! At least the solutions are a little different

»
22 months ago, # |
Rev. 5   Vote: I like it 0 Vote: I do not like it

Yes you can solve this in $$$O (n log n)$$$.

Use Sweepline + Min with Range add segment tree. Iterate over the left border $$$l$$$, define $$$weights[i] = 1$$$ if position $$$i$$$ is the first occurrence of $$$a_i$$$ on $$$[l,n]$$$, $$$-1$$$ if position $$$i$$$ is the $$$k$$$-th occurrence of $$$a_i$$$ on $$$[l,n]$$$, and 0 in all other cases. The segment tree will store a prefix sum of this weight. So the rightmost value r such that $$$[l,r]$$$ being valid is precisely the the rightmost value 0 on this segment tree. This can be found with segment tree descent (note that prefix sum of weight is non-negative, so we can find 0s by a descent.). It can be seen that every time you move $$$l$$$ to $$$l+1$$$, you just need to do upto 4 range add/subtracts.