Offline Range MEX queries in O(log n)

#	User	Rating
1	tourist	3880
2	jiangly	3669
3	ecnerwala	3654
4	Benq	3627
5	orzdevinwang	3612
6	Geothermal	3569
6	cnnfls_csy	3569
8	jqdai0815	3532
9	Radewoosh	3522
10	gyh20	3447

#	User	Contrib.
1	awoo	161
2	maomao90	160
3	adamant	157
4	maroonrk	154
5	-is-this-fft-	148
5	atcoder_official	148
5	SecondThread	148
8	Petr	147
9	TheScrasse	145
10	nor	144

Consider the following problem :

Your are given an array $$$a$$$ of size $$$n$$$ and $$$q$$$ queries of form $$$(l_1, r_1), (l_2, r_2), ..., (l_q, r_q)$$$. For each query you have to find the MEX of the array $$$a_l, a_{l+1}, ..., a_r$$$. The queries are offline.

$$$0 \le a_1, a_2, ..., a_n \le n$$$

$$$1 \le l_i \le r_i \le n$$$, for all $$$1 \le i \le q$$$

The approach to the problem is discussed in the comment here.

First let's consider a special case of the queries where $$$r = n$$$. In this case we want to find the smallest integer that does not occur in the subarray $$$a_l, ..., a_r$$$. That is, the smallest integer that does not occur at index $$$l$$$ or after it.

For all $$$0 \le x \le n$$$, let $$$b_x$$$ be the greatest index $$$i$$$ such that $$$a_i = x$$$, and $$$0$$$ if no such $$$a_i$$$ exists.

Now the MEX say $$$m$$$ of subarray $$$a_l, ..., a_n$$$ either does not occur in $$$a$$$, in which case we have $$$b_m = 0$$$, or the last occurence of $$$m$$$ in $$$a$$$ is before $$$l$$$, thus $$$b_m < l$$$. We can say that $$$m$$$ is the minimum of all such numbers, thus $$$m$$$ = $$$min$$$ { $$$i$$$ | $$$b_i < l$$$ }.

Now since the queries are offline, we can sort the queries by $$$r$$$ and process them such that we can ignore the subarray $$$a_{r + 1}, ..., a_n$$$. We iterate from from $$$r = 1, 2, ..., n$$$, and for each $$$r$$$ we process all queries $$$(l_i, r_i)$$$ where $$$r_i = r$$$.

Now how do we efficiently calculate the answer for queries $$$(l, n)$$$ ?

We maintain a minimum segment tree on the array $$$b$$$ and update all values for $$$i = 1, 2, ..., r$$$. Now since there are $$$n$$$ updates but $$$n + 1$$$ leaf nodes in the segment tree, at least one of them is zero. So value of root node is $$$0$$$. Now we try to find the minimum index leaf node with value < $$$l$$$. We start at root node and travel down to this node. At each node we see if the value of it's left child is < $$$l$$$. If it is we go to left child. Otherwise we go to the right child. Either of left or right child will always have value < $$$l$$$. Finally we reach a leaf node, the index of this node(after adjusting for offset) is the mex of array $$$a_l, a_{l + 1}, ..., a_r$$$.

Consider the array $$$a = [6, 1, 0]$$$.

Here $$$b = [3, 2, 0, 0, 0, 0, 1, 0]$$$.

Consider evaluation for $$$l = 2, r = 3$$$. The subarray is $$$[a_2, a_3] = [1, 0]$$$.

We start at root node.

We see that the left child has value < $$$l$$$, so we go to the left child.

Now for left node $$$2 \ge l$$$. So we go to right node.

Now for the left child value < $$$l$$$. So we go to left child.

We have reached a leaf node. The corresponding index is 2 and thus the MEX of the subarray $$$[a_2, a_3]$$$ is 2.

Calculating MEX of each query takes $$$O(log n)$$$ time, total $$$O(q log n)$$$. Updating the tree for $$$b$$$ takes $$$O(n log n)$$$ time in total. Sorting $$$q$$$ queries takes $$$O(q log q)$$$ time.

Overall time complexity — $$$O(n log n + q(log n + log q))$$$

#include<bits/stdc++.h>
using namespace std;
#define ll long long
struct SegmentTree{
    int n;
    vector<int> tree;
    SegmentTree(int sz){
        n = 1;
        while(n < sz){
            n <<= 1;
        }
        tree.assign(2 * n, 0);
    }
    void set(int ind, int val){
        ind += n;
        tree[ind] = val;
        ind >>= 1;
        while(ind > 0){
            tree[ind] = min(tree[2 * ind], tree[2 * ind + 1]);
            ind >>= 1;
        }
    }
    int get(int x){
        // return the first index i, such that s[i] < x
        int node = 1;
        while(node < n){
            int left = (node << 1);
            int right = (node << 1) + 1;
            if(tree[left] < x){
                node = left;
            }else{
                node = right;
            }
        }
        return (node - n);
    }
};
int main(){
    int n;
    cin >> n;
    vector<int> a(n + 1);
    for(int i = 1; i <= n; ++i){
        cin >> a[i];
    }
    int q;
    cin >> q;
    vector<vector<pair<int, int>>> queries(n + 1);
    for(int i = 0; i < q; ++i){
        int l, r;
        cin >> l >> r;
        queries[r].push_back({l, i});
    }

    vector<int> res(q);

    SegmentTree s(n + 1);
    for(int i = 1; i <= n; ++i){
        // set the last occurence of a[i] to i
        s.set(a[i], i);

        for(auto [l, ind] : queries[i]){
            // find the smallest x, such that last occurence of x < l
            res[ind] = s.get(l);
        }
    }
    for(int elem : res){
        cout << elem << '\n';
    }
    cout << '\n';

    return 0;
}

Comments (7)

Write comment?

one_autum_leaf

13 months ago, # |

Auto comment: topic has been updated by one_autum_leaf (previous revision, new revision, compare).

→ Reply

igzou

5 weeks ago, # |

Practice question

https://leetcode.com/problems/smallest-missing-genetic-value-in-each-subtree/description/

hydra_cody

12 days ago, # |

← Rev. 2 →

How to solve this

I am thinking of making dfs tree array and work on this array. But how do we solve range MEX queries with update on array?

Hamza_Hassanain

12 days ago, # ^ |

Did you think of square root decomposition?

How will you combine results from two square roots? We can't work with frequency for each square root.

lemelisk

But how do we solve range MEX queries with update on array?

Can be done easily in $$$\mathcal{O}(n^{5/3}+q\sqrt{n})$$$ via 3D Mo and a bit harder in $$$\mathcal{O}(n \log{n}+q\log^2{n})$$$. For the second approach, if the value of $$$x$$$ is at positions $$$p_1, p_2 \dots p_k$$$, consider segments $$$[1; p_1 - 1]$$$, $$$[p_1+1, p_2-1]$$$ ... $$$[p_{k-1}+1;p_k - 1]$$$, $$$[p_k+1, n]$$$ with the value of $$$x$$$ associated with these segments. There will be a total of $$$\mathcal{O}(n)$$$ such segments, and each update changes only $$$\mathcal{O}(1)$$$ of them. And for a range MEX query, we need to find the segment with the smallest $$$x$$$ for all the segments in which the query segment is nested. This can be done for $$$\mathcal{O}(\log^2{n})$$$ in various ways using data structures.

abd-alrzaq

practice https://codeforces.com/gym/105242/problem/A

one_autum_leaf's blog