### kaldiuo's blog

By kaldiuo, history, 3 years ago,

There are q<=5*10^5 online queries of 2 types for a set of integers:

1. Add an integer 1<=x<=10^9 to the set
2. Remove an integer from the set
3. Find the minimum element >=x that isn't present in the set

Is there a way to efficiently process these queries?

• +22

 » 3 years ago, # |   +41 If you can answer in offline, then just compress X in all queries and use simple segment tree with sum on a range of fenwick tree. If you must answer in online then use implicit segment tree.
•  » » 7 months ago, # ^ |   -30 Can you kindly provide more information?And would your idea work with answering interval query?in other words, suppose you have array of N items and Q queries and you want to find the MEX in each [L, R].n, a[i] <= 1e5.
•  » » » 7 months ago, # ^ |   +64 Come on, think before asking. I get that the comment above was terse but at least this should be clear: If you can find the smallest element that is not in the set and is at least $L$, then of fucking course you can find the smallest element that is not in the set and is at least $L$ and at most $R$. Either the answer is the same or it doesn't exist in the second case.
•  » » » » 7 months ago, # ^ |   +28 Huh, I thought he was asking about a querying the MEX of a subarray, not about the problem from the blog but with ">=x" replaced by ">=x and <=y".
•  » » » » » 7 months ago, # ^ |   0 Oh shit, okay. In that case I'll be annoyed by asking not really relevant questions instead of not thinking.
 » 3 years ago, # | ← Rev. 2 →   +16 In your previous version, you did not mention that you can remove elements.So my approach will work when you just add elements and query. When I am adding x to set, I will:1. Unite the clusters of x, x-1 if x-1 is present in set2. Unite the clusters of x,x+1 if x+1 is present in set When doing the unite, maintain the global max of that cluster.The ans for 2nd query will be then the global max of its cluster + 1.
 » 3 years ago, # |   +48 You have 5 * 10^5 queries, so answer is less than 5 * 10^5. So you can make a set2, where you keep integers that are not in your set. Answer will be set2.min
•  » » 3 years ago, # ^ |   +13 Sorry if the title confused you, query 3 wants the mex >= x, where x can be anything.
•  » » » 3 years ago, # ^ | ← Rev. 2 →   +5 a simple lower_bound on the set2 will do that...wont it?EDIT: kinda messed up here :P....x can be large
•  » » » » 3 years ago, # ^ | ← Rev. 2 →   0 The added elements can range from 1 to 10^9, so unless for query 3 x is always 0, set2 has to start with all integers from 1 to 10^9.
•  » » » » » 3 years ago, # ^ |   0 ah right...i messed up there...just use a dynamic segment tree then..
•  » » » » » 3 years ago, # ^ |   +15 You only need to consider all numbers x and x+1 such that x is a number from the queries. So, it will work.
•  » » » 3 years ago, # ^ | ← Rev. 2 →   +5 Oh, I missed it, sorry) So you can do smth like this with dynamic segment tree Or if your problem is offline, you can make "compressison of coordinates" and use my trick with lower_bound, If problem is online you also can do it, in this case you need to keep set of segments(if you added 1,5 and 7 to your set, your set 2 will contain (2,4), (6, 6) and (8, 10^9). Now you can do lower_bound again)
 » 3 years ago, # | ← Rev. 2 →   +16 You can also maintain a dynamic segment tree on the values (x)..store frequencies of 'x' maybe even count of numbers (store 1 if frequency of 'x'>=1)...this is probably an overkill for your problem but will allow even queries like kth element >=x which isn't present in the set (do a binary search integrated in the segtree...still log(x) ). So yeah might give you little more flexibility. Might not be super good memory wise though...
 » 3 years ago, # | ← Rev. 9 →   -8 Use Treap that accumulates following data: Count of nodes in a subtree including current vertex (cnt(this) = cnt(left) + cnt(right) + 1); Maximum and minimum of subtree. Largest contiguous prefix (i.e. the one with elements of form x, x + 1, x + 2...x + N). Maintaining such a prefix isn't that challenging. If prefix of left subtree equals to size of left subtree AND maximum of left subtree  + 1 equals to key of current node then the prefix of current node is prefix(left) + 1 else it's just prefix(left). If prefix(left) + 1 = prefix(this) then you should inspect the right subtree: if min(right) - 1 = key(this) then prefix(this) = prefix(left) + 1 + prefix(right). Now to answer queries:Given x you should split you treap by key x. Now, right tree after split contains all integers greater than x. If min(splitright) - 1 > x then the answer is just x + 1, otherwise you should split splitright by count(not key), where count = prefix(splitright). The answer will be max(split - by - countleft) + 1.Time complexity per query(any) : O(log(Q)), Space complexity: O(Q), Q = 5 * 105.Nonetheless, Treap is not the fastest data structure so do not expect this solution to be blazing fast.
 » 3 years ago, # |   -14 The fastest way is Fenwick tree with length 5*10^5 (as the answer can't exceed 5*10^5).
•  » » 3 years ago, # ^ |   +20 Sorry if the title confused you, query 3 wants the mex >= x, where x can be anything.
 » 3 years ago, # |   +5 Can you share the link of the problem from any online judge?
 » 3 years ago, # |   -9 I think the best solution would be to have a BBST (Treap, Splay...) where each node contains the closest node with a count of 0. When you insert a value at x, make sure to add a dummy node with count 0 at (x+1).When you are removing, simply subtract the counts by 1.When you are querying, first check if the query element exists. If it does not exist, then the answer is simply that, otherwise use the nodes in the tree to binary search for it.
 » 3 years ago, # | ← Rev. 3 →   -17 let's say we have an ordered-set (but here we can keep frequencies in map).Now insertion and deletion into o-set is straightforward.For finding smallest y such that y>=x and y is not present in the o-set we can binary search on y in the range [x,infinity].1 : it will work in O(logn)2 : it will work in O(logn)3 : it will work in O(logn*log(range))Please point out if you find any mistake or flaw in the logic.
 » 3 years ago, # |   +44 Store your current collection as a union of intervals. For example, if your actual set is S = {1,2,3,7,10,11}, store it as [1,4), [7,8), [10,12). You can easily use a standard C++ set for this.Operation 1: Add a one-element interval [x,x+1). Check whether you need to merge it with the previous and/or next interval.Operation 2: Using set::upper_bound, find the interval that contains your x. Remove it from the set, split it into two smaller intervals and reinsert them if they are not empty.Operation 3: Using set::upper_bound, find whether x is in your set. If it isn't, return x. If it is, return the upper bound of the interval that contains x.Each operation runs in O(log n) and you don't need to implement any trees.
 » 3 years ago, # | ← Rev. 2 →   +6 My idea:Create a set of intervals S. If the interval [l, r] is present in S, that means that the elements l, l + 1, l + 2, ..., r are in S. Adding an element x is handled like this: If [whatever, x - 1] is present in the set, remove it and add [whatever, x]. Likewise, if [x + 1, whatever] is in the set, remove it and add [x, whatever]. Removals are handled in this way: Find the interval containing x. Split it in two intervals [start, x - 1], and [x + 1, end]. MEX queries are handled similarly: Find the interval containing x. Output end + 1. This can be implemented using 2 std::set> in C++. The first set indexes pairs by the first element (i.e. left end), and the second one by their second element (i.e. right end). Codeclass MEX { typedef std::pair interval; std::set s[2]; std::multiset values; void insert_pair(interval p) { auto[x, y] = p; if (x > y) std::swap(x, y); s[0].insert({x, y}); s[1].insert({y, x}); } void remove_pair(interval p) { auto[x, y] = p; if (x > y) std::swap(x, y); s[0].erase({x, y}); s[1].erase({y, x}); } std::optional is_end(int x, bool left) { int idx = left; auto it = s[idx].lower_bound({x, 0}); if (it != s[idx].end() && it->first == x) return *it; else return std::nullopt; } public: void insert(int x) { bool ok = values.find(x) == values.end(); values.insert(x); if (!ok) return; auto p1 = is_end(x - 1, true); auto p2 = is_end(x + 1, false); if (p1 && p2) { interval i = {p1->second, p2->second}; remove_pair(*p1); remove_pair(*p2); insert_pair(i); } else if (p1 && !p2) { interval i = {p1->second, x}; remove_pair(*p1); insert_pair(i); } else if (!p1 && p2) { interval i = {x, p2->second}; remove_pair(*p2); insert_pair(i); } else if (!p1 && !p2) { interval i = {x, x}; insert_pair(i); } } void remove(int x) { values.erase(values.find(x)); if (values.find(x) != values.end()) return; auto it = s[1].lower_bound({x, 0}); if (it->first != x && it->second != x) { interval p1 = {it->second, x - 1}; interval p2 = {x + 1, it->first}; remove_pair(*it); insert_pair(p1); insert_pair(p2); } else if (it->first != x && it->second == x) { interval p = {x + 1, it->first}; remove_pair(*it); insert_pair(p); } else if (it->first == x && it->second == x) { remove_pair(*it); } else if (it->first == x && it->second != x) { interval p = {it->second, x - 1}; remove_pair(*it); insert_pair(p); } } int get(int x) { if (values.find(x) == values.end()) return x; auto it = s[1].lower_bound({x, 0}); return it->first + 1; } }; 
•  » » 6 months ago, # ^ |   0 This gives WA in some cases. https://atcoder.jp/contests/hhkb2020/submissions/17318696 Can we do it without the use of std::optional as well?
•  » » » 6 months ago, # ^ |   0 Indeed, {p2->first, p1->first} should be {p1->second, p2->second}: https://atcoder.jp/contests/hhkb2020/submissions/17323306
•  » » » » 6 months ago, # ^ |   0 what about the std::optional ?? It is not available in c++-14
•  » » » » » 6 months ago, # ^ |   0 You could use std::pair instead of std::optional.
•  » » » 6 months ago, # ^ |   0 Umm..you didn't have to do all this for the atcoder problem...you could have a set containing all numbers from 0 to 200000...when you have a input n remove it from set(can do it with erase) and then just output the 1st element of set everytime.
 » 3 years ago, # |   +13 You can also use a bit trie — this is a trie made of bits of your numbers, thus it is represented as a binary tree (since there are only two types of bits — 0 and 1), and its depth is .When you add a number, you mark the vertex that representing this particular number as a full binary tree, and then you recursively go over its parents marking them as full binary trees if their left and right children are both full binary trees as well.When you remove a number, you unset the "full binary tree" mark and do so for all its parents.When you want to get mex, you should do the DFS towards x, avoiding visiting subtrees which are marked as full (because there are no free values in such subtrees), and attempt to visit a left subtree first if you are allowed to by the x restriction.I admit that there are better solutions above for this particular problem, but it's good to know about that function of a bit trie: for example, you can perform both "find mex" and "xor all the values" operations with a single data structure. This structure was even proposed to be implemented once on Open Olympiad (check the latest problem). code (multiset edition, not tested tho lol)#include using namespace std; struct Node { Node *children [2]; bool is_full = false; int cnt = 0; public: Node() { children[0] = nullptr; children[1] = nullptr; } }; inline Node *nonnull(Node *&n) { if (!n) n = new Node(); return n; } void add(int val, int level, Node *n) { if (level == -1) { n->cnt++; n->is_full = true; return; } int cur_bit = (val & (1 << level)) >> level; add(val, level-1, nonnull(n->children[cur_bit])); n->is_full = nonnull(n->children[0])->is_full && nonnull(n->children[1])->is_full; n->cnt = nonnull(n->children[0])->cnt + nonnull(n->children[1])->cnt; } void remove(int val, int level, Node *n) { if (level == -1) { n->cnt--; n->is_full = n->cnt > 0; return; } int cur_bit = (val & (1 << level)) >> level; remove(val, level-1, nonnull(n->children[cur_bit])); n->is_full = nonnull(n->children[0])->is_full && nonnull(n->children[1])->is_full; n->cnt = nonnull(n->children[0])->cnt + nonnull(n->children[1])->cnt; } int answer(int val, int level, Node *n) { if (n->is_full) { return -1; } else if (n->cnt == 0) { return val; } int cur_bit = (val & (1 << level)) >> level; int ans = answer(val, level-1, nonnull(n->children[cur_bit])); if (cur_bit == 0 && ans == -1) { ans = answer((val >> level << level) | (1 << level), level-1, nonnull(n->children[1])); } return ans; } Node root_node; int main(void) { int n; scanf("%d", &n); add(0, 31, &root_node); for (int i = 0; i < n; i++) { int t, val; scanf("%d%d", &t, &val); if (t == 1) { add(val, 31, &root_node); } else if (t == 2) { remove(val, 31, &root_node); } else if (t == 3) { printf("%d\n", answer(val, 31, &root_node)); } } return 0; } 
•  » » 6 weeks ago, # ^ |   -8 This method is really awesome.
 » 3 years ago, # |   -9 you can use ordered set from library, it's enough for your problem
 » 7 months ago, # |   +99 You can solve it with 2 sets. Keep one set of values that are in the set, and one set of "next values that aren't in the set". So if x is in the set and x+1 isn't in the set, x+1 is in the second set. Query can be answered by checking if x is in the first set, if it is the answer is the lower bound of the second set otherwise the answer is x.
 » 6 months ago, # |   -10 a single treap is enough