By adamant, 6 years ago, translation, ,

Hi everyone! After a relatively long lull, I decided that my contribution growing too slowly the hour has come to please you with another article in the blog :)

2 months ago user Perlik wrote an article, in which he described a very interesting STL implemented data structure that allows you to quickly perform various operations with substrings. Some time after I tested it on various tasks and, unfortunately, tend to get a negative result — rope was too slow, especially when it came to working with individual elements.

For some time, I forgot about that article. Increasingly, however, I was faced with problems in which it was necessary to implement set with the ability to know ordinal number of item and also to get item by its ordinal number (ie, order statistic in the set). And then I remembered that in the comments to that article, someone mentioned about the mysterious data structure order statistics tree, which supports these two operations and which is implemented in STL (unfortunately only for the GNU C++). And here begins my fascinating acquaintance with policy based data structures, and I want to tell you about them :)

Let's get started. In this article I will talk about IMO the most interesting of the implemented structures — tree. We need to include the following headers:

#include <ext/pb_ds/assoc_container.hpp> // Common file
#include <ext/pb_ds/tree_policy.hpp> // Including tree_order_statistics_node_update


After closer inspection you may find that the last two files contained in the library

#include <ext/pb_ds/detail/standard_policies.hpp>


Namespace, which we will have to work in newer versions of C++ is called __gnu_pbds;, earlier it was called pb_ds;

Now let's look at the concrete structure.

The tree-based container has the following declaration:

	  template<
typename Key, // Key type
typename Mapped, // Mapped-policy
typename Cmp_Fn = std::less<Key>, // Key comparison functor
typename Tag = rb_tree_tag, // Specifies which underlying data structure to use
template<
typename Const_Node_Iterator,
typename Node_Iterator,
typename Cmp_Fn_,
typename Allocator_>
class Node_Update = null_node_update, // A policy for updating node invariants
typename Allocator = std::allocator<char> > // An allocator type
class tree;



Experienced participants may have already noticed that if initialize the template only the first two types, we obtain almost exact copy of the container map. Just say, that this container can be set, for this you just need to specify the second argument template type as null_type ( in older versions it is null_mapped_type).

By the way Tag and Node_Update are missing in map. Let us examine them in more detail.

Tag — class denoting a tree structure, which we will use. There are three base-classes provided in STL for this, it is rb_tree_tag (red-black tree), splay_tree_tag (splay tree) and ov_tree_tag (ordered-vector tree). Sadly, at competitions we can use only red-black trees for this because splay tree and OV-tree using linear-timed split operation that prevents us to use them.

Node_Update — class denoting policy for updating node invariants. By default it is set to null_node_update, ie, additional information not stored in the vertices. In addition, C++ implemented an update policy tree_order_statistics_node_update, which, in fact, carries the necessary operations. Consider them. Most likely, the best way to set the tree is as follows:

typedef tree<
int,
null_type,
less<int>,
rb_tree_tag,
tree_order_statistics_node_update>
ordered_set;


If we want to get map but not the set, as the second argument type must be used mapped type. Apparently, the tree supports the same operations as the set (at least I haven't any problems with them before), but also there are two new features — it is find_by_order() and order_of_key(). The first returns an iterator to the k-th largest element (counting from zero), the second — the number of items in a set that are strictly smaller than our item. Example of use:

    ordered_set X;
X.insert(1);
X.insert(2);
X.insert(4);
X.insert(8);
X.insert(16);

cout<<*X.find_by_order(1)<<endl; // 2
cout<<*X.find_by_order(2)<<endl; // 4
cout<<*X.find_by_order(4)<<endl; // 16
cout<<(end(X)==X.find_by_order(6))<<endl; // true

cout<<X.order_of_key(-5)<<endl;  // 0
cout<<X.order_of_key(1)<<endl;   // 0
cout<<X.order_of_key(3)<<endl;   // 2
cout<<X.order_of_key(4)<<endl;   // 2
cout<<X.order_of_key(400)<<endl; // 5


Finally I would like to say about the performance of order_statistics_tree in STL. For this, I provide the following table.

 Solution\Problem 1028 1090 1521 1439 order_statistics_tree, STL 0.062 0.218 0.296 0.468 Segment tree 0.031 0.078 0.171 0.078 0.859* Binary Indexed Tree 0.031 0.062 0.062 -

* The final task requires direct access to the nodes of the tree for the implementation of solutions for O (mlogn). Without it, the solution works in O (mlogn*logn).

As you can see from all this , order_statistics_tree relatively little behind handwritten structures, and at times ahead of them in execution time. At the same time the code size is reduced considerably. Hence we can conclude is that order_statistics_tree — it is good and it can be used in contests.

Besides tree, I also wanted to describe here trie. However , I was confused by some aspects of its implementation, greatly limiting its usefulness in programming olympiads, so I decided not to talk about it. If anyone want he is encouraged to try to learn more about this structure by himself.

P.S. Sorry for my poor English :)

• +120

 » 6 years ago, # |   +12 Example of trie with search of prefix range. Problem: 1414 Solution: http://ideone.com/6VFNZl
•  » » 7 months ago, # ^ | ← Rev. 2 →   0 Is there a way of counting number of strings in the trie with a certain prefix without iterating through them all?
 » 6 years ago, # |   +17 Возможно, вам покажутся слегка нетривиальными решения деревом отрезков и деревом Фенвика, особенно, задач 1521 и 1439. Скорее всего, позже я также предоставлю статью, в которой опишу некоторые интересные способы использования этих структур, которые редко встречаются.======================================================================================= You may be wondered about how I use segment tree and binary indexed tree in my solutions, especially for problems 1521 and 1439. Most likely, later I'll provide an entry about some interesting ways of using this structures, which are quite rare.
•  » » 6 years ago, # ^ |   0 Here it is :)
 » 5 years ago, # |   +21 This is really useful. Thanks a lot!
 » 5 years ago, # |   0 Very useful article! I need order-statistics on a multiset. How should I define the tree as?
•  » » 5 years ago, # ^ |   +8 As I know, there is no implemented tree multiset in STL. However you can use pair as a key where the second element in pair is the time when item has been added.
•  » » » 3 years ago, # ^ |   +8 Apparently, you can. Once I tried to write less_equal instead of less and it started to work as multiset, I even got AC using it in region olympiad)
•  » » » » 3 years ago, # ^ | ← Rev. 2 →   +11 I can't erase elements with less_equal comparator, e.g. this code output "1" code#include #include using namespace std; using namespace __gnu_pbds; typedef tree, rb_tree_tag, tree_order_statistics_node_update> ordered_set; signed main() { ordered_set d; d.insert(1); d.erase(1); for (auto i: d) { cout << i << endl; } } So I guess it's not very useful thing (or I do something wrong).UPD: I can delete iterator which I got with lower_bound. But it works incorrectly. This code erase 1, not 0 Code#include #include using namespace std; using namespace __gnu_pbds; typedef tree, rb_tree_tag, tree_order_statistics_node_update> ordered_set; signed main() { ordered_set d; d.insert(0); d.insert(1); d.erase(d.lower_bound(0)); for (auto i: d) { cout << i << ' '; } } 
•  » » » » » 3 years ago, # ^ |   0 Wow. then it really sucks. Seems like I only used it with insert operations and strangely enough it worked)
•  » » » » » 23 months ago, # ^ | ← Rev. 2 →   -8 Well, actually it works fine and exactly does what you want! The issue is that you're passing less_equal as the tree comparator. Therefore it uses the same function for lower_bound(). By definition of lower_bound function (according to cplusplus.com) it finds the first element not compared true. Thus returns the first element greater than val which is 1 in your example.In order to make sure you may even test set > which results the same.
•  » » » » » 6 weeks ago, # ^ | ← Rev. 3 →   0 What if I want to calculate the index of upper_bound of a particular element? Suppose we have: 1 1 2 3 4 then how to find index(upper_bound(2))?UPDATE: Maybe it is = order_of_key(number+1) ?
•  » » » » 5 months ago, # ^ |   +3 Another drawback of using less_equal instead of less is that lower_bound works as upper_bound and vice-versa. Code
•  » » » 12 days ago, # ^ | ← Rev. 2 →   0 What about the comparator i.e. less
•  » » 11 months ago, # ^ |   0 typedef tree indexed_multiset;
•  » » » 11 months ago, # ^ |   0 Can we use this in this question? or we can't use it, as I am not able to implement the multiset part
•  » » » » 9 months ago, # ^ |   +8 just use the fucking binary search on this problem
•  » » » 9 months ago, # ^ | ← Rev. 2 →   0 The 3rd template argument must be less_equal. But adamant, is it the correct way to do this ? Since as far as I know, most of the STL containers require a comparator that offers a strict weak ordering (Not sure of the exact reasons though). So, will there be some drawbacks of trying to construct a multiset this way?
•  » » » » 5 months ago, # ^ |   0 can't we just use it as multiset with less_equal and then assume that lower_bound and upper_bound are exchanged. I mean that's not right to do but will get u AC i think :p
 » 5 years ago, # | ← Rev. 2 →   0 .(same comment as above)
 » 5 years ago, # |   0 what will be the complexity of erase operation? O(logn) or O(n)
•  » » 5 years ago, # ^ |   0 O(logn)
•  » » » 18 months ago, # ^ |   0 Really??
•  » » » » 18 months ago, # ^ |   0 Yes.
 » 5 years ago, # |   0 Is there any efficient function to merge 2 trees ?
•  » » 19 months ago, # ^ |   +1 You can do in log(n) if the greatest element of 1 tree is smaller than smallest of other. Otherwise, I don't you have a better option. Tell me as well if you have found something interesting.
•  » » » 15 months ago, # ^ |   0 How do you merge two non-intersected rbtrees (as in the article) in O(lg n) time? I find that the default join() function takes linear time...
•  » » » » 10 months ago, # ^ |   0 Have you found something interesting about merge ? Im trying to do .join but it throws error.
 » 4 years ago, # |   0 Surely worth looking into :) Thanks
 » 4 years ago, # |   0 how can i use it like multiset ?
•  » » 4 years ago, # ^ |   0 Main idea is to keep pairs like {elem, id}. typedef tree< pair, null_type, less>, rb_tree_tag, tree_order_statistics_node_update> ordered_set; int t = 0; ordered_set me; ... me.insert({x, t++}); me.erase(me.lower_bound({x, 0})); cout << me.order_of_key({x, 0}) << "\n"; 
•  » » » 4 years ago, # ^ |   0 thanks a lot :)
•  » » » 3 years ago, # ^ |   0 How will find_by_order be used in this case?
•  » » » 2 years ago, # ^ |   0 like me.order_of_key({x, 0}) me.find_by_order({x,0}) dose not work.. why??
•  » » » » 2 years ago, # ^ |   0 *me.find_by_order({x,0})
•  » » » » » 2 years ago, # ^ |   0 still it does not work.
•  » » » » » » 4 months ago, # ^ |   0 sure it does work, but you cannot print a pair so you have to do it like this cout << me.find_by_order(1)->first ;
•  » » » » » 2 years ago, # ^ |   0 wtf, find_by_order takes number
•  » » » » » » 8 months ago, # ^ |   0 how to use find_by_order if I'm using ordered_set with pairs. ~~~~~ typedef tree< pair, null_type, less>, rb_tree_tag, tree_order_statistics_node_update> ordered_set; ~~~~~
•  » » » » » 10 months ago, # ^ |   0 what is x here
•  » » 11 months ago, # ^ |   0 typedef tree indexed_multiset;
 » 4 years ago, # |   0 Hi, adamant, the code files in Useful Links don't seem to work. Could you fix them?Thanks for this great post. I am looking forward to your next and next next posts.
•  » » 4 years ago, # ^ |   0 Can you elaborate please?
•  » » » 4 years ago, # ^ | ← Rev. 2 →   0 For example, the code in "Demonstration of trie with prefix search" cannot run on my computer. I saw that there was some old syntax like the namespace pb_ds. I changed it, then it returned a new error in another place. The truth is I am not good enough to change things any more. I hope that you can update it. (I know that I can use the trie code in one of your comments, but this post would be even better if the cost in Useful Links were also updated)Thank you.
•  » » » » 4 years ago, # ^ |   +1 I can't edit the original files — they're not mine. But here is the correct version: http://ideone.com/BpZlYO
•  » » » » » 4 years ago, # ^ |   0 Thank you.
 » 3 years ago, # |   0 https://www.e-olymp.com/ru/problems/2961 Is it possible to solve this problem using this algorithm?
 » 2 years ago, # |   +1 Can anybody share a Java equivalent class for this type of set or a code which acts according to above data structure?
•  » » 2 years ago, # ^ | ← Rev. 2 →   0 It does not exist.You may use instead: self-written tree treap numbers compression + fenwick tree
•  » » » 2 years ago, # ^ |   0 I thought of number compression + fenwick tree, but this solution will work for only offline queries. I want to handle online queries. The best I can think of now is Treap + Segment Tree or Treap + Fenwick Tree. But here again is the problem of implementation of mixed data structure, I am unable to think how to implement that. Can you please help me?
 » 2 years ago, # |   0 Any idea on how to use this pre C++11?
 » 2 years ago, # |   0 How can I use a custom compare function in the "Key comparison functor" section for custom data types?
•  » » 2 years ago, # ^ |   0 Just like for regular set..
•  » » » 2 years ago, # ^ |   0 I can use custom compare function for a set by using operator overloading. I want to know is there any other way to do this for both set and ordered set using lambda expression or just using a compare bool function?Thank adamant you very much for your nice post.
•  » » » » 2 years ago, # ^ | ← Rev. 2 →   +3 I suppose you can overload operator and still use less. Also you can use functors and lambdas in the way similar as for sets: auto cmp = [](int a, int b){return a < b;}; tree x(cmp); tree> y([](int a, int b) { return a < b;}); 
•  » » » » » 2 months ago, # ^ |   0 Why does it only work with lambdas and not functions? On doing tree, null_type, decltype(&comp), rb_tree_tag, tree_order_statistics_node_update> ordered_set(&comp); Where comp is a comparator function, I get an error.
 » 2 years ago, # |   0 How To Solve This Problems. Plz Help Me.
 » 2 years ago, # |   0 Is it possible to search for the order of a non-key value by passing a comparator/function ? I sometimes find myself have to use my own tree templates instead because of having to write a search function that can cope with this task.
 » 22 months ago, # |   +6 What is the constant factor of order_statistics_tree though it executes in logarithmic complexity ? I think it's constant factor is very high.
 » 20 months ago, # |   0 could anyone write the exact thing to use this data structure as map..pls.I'm not able do so.
•  » » 20 months ago, # ^ |   0 And what exactly do you want from it? You can use something like that. #include using namespace __gnu_pbds; template> using ordered_map = tree; ordered_map my_map; 
•  » » » 20 months ago, # ^ |   0 as it was mentioned in the article that we can use it as map by defining mapped type .So I tried to do that by couldn't. that's all;)
•  » » » 18 months ago, # ^ |   0 is this thing a order statistics tree
•  » » » 5 months ago, # ^ |   0 Can you write the multiset implementation of ordered_set. When I use less_equal then I'm not able to erase from the ordered_set. And when I use the pair for including duplicates I'm not able to use find_by_order({x,0}).
•  » » » » 5 months ago, # ^ |   0 What's wrong with find_by_order({x, 0})?
•  » » » » » 5 months ago, # ^ |   0 It gives an error. It says no known conversion. error: no matching function for call to '__gnu_pbds::tree, __gnu_pbds::null_type, std::less >, __gnu_pbds::rb_tree_tag, __gnu_pbds::tree_order_statistics_node_update>::find_by_order()
•  » » » » » » 5 months ago, # ^ |   0 Wait, find_by_order takes number $k$ and returns $k$-th element. What exactly do you expect it to return?..
•  » » » » » » » 5 months ago, # ^ |   0 A number .
•  » » » » » » » » 5 months ago, # ^ |   0 I think you should use order_of_key then
•  » » » » » » » » » 5 months ago, # ^ |   0 Thanks adamant I did that.
•  » » » » » 5 months ago, # ^ | ← Rev. 2 →   0 I have been searching for that for a long time.Can u please provide us with a multiset implementation. It would be really helpful if u provide us with erase() and find_by_order() functions. And I believe this would be useful to a lot of coders.
•  » » » » » 5 months ago, # ^ | ← Rev. 2 →   0 Actually find_by_order( x) takes in an integer input because it tells us the element present in the position x. Whereas find_by_order({x, 0}) is a syntax error and it wont work.
•  » » » » » » 2 months ago, # ^ |   0 So basically it is not possible to find the K-th smallest element while we are using pair ?
 » 18 months ago, # |   0 can we use it as multiset?
•  » » 16 months ago, # ^ |   0 u can use pair for manage the duplicate values..
•  » » 11 months ago, # ^ |   0 typedef tree indexed_multiset;
•  » » » 11 months ago, # ^ | ← Rev. 2 →   0 It is:typedef tree, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;
 » 11 months ago, # |   0 How can I erase element from order_set by it's value?
•  » » 11 months ago, # ^ |   +1 just do ordered_set st; st.erase(key);
•  » » » 5 months ago, # ^ |   0 This doesn't work .
•  » » » » 5 months ago, # ^ |   0 ordered_set :: iterator it; it=st.upper_bound(key); st.erase(it); it works;
•  » » » » » 5 months ago, # ^ |   0 Thanks bro
 » 8 months ago, # |   0 (Sorry for necroposting) Does anyone know how to compile with the pbds header files on Mac OS X ? I think the g++ command is configured to use clang by default, and so it is not directly available. I've tried adding ext/pb_ds into the include folder (the same way you would enable bits/stdc++.h) but instead new dependencies come up.
 » 8 months ago, # | ← Rev. 2 →   0 Note that: Deferencing the end iterator usually gives 0. There's no error message even with -D_GLIBCXX_DEBUG and -fsanitize=undefined. The returned object is not constructed properly (can be tested with a class that logs all of its construction/destruction) Erasing the end iterator is usually a no-operation (unlike set in debug mode), although I think this is undefined behavior.
 » 8 months ago, # | ← Rev. 2 →   +6 Note on using less_equal as comparison function to use it as a multiset: _GLIBCXX_DEBUG must not be defined, otherwise some internal check will fail. find will always return end. lower_bound works like upper_bound in normal set (to return the first element > it) upper_bound works like lower_bound in normal set (to return the first element >= it) find_by_order and order_of_key works properly (unlike the 2 functions above). Some code to verify the points above: Try it online!
 » 7 months ago, # | ← Rev. 2 →   0 adamant, while discussion, someone suggested that the time complexity of using it as a set is amortized log(n), and this post says that it means that in some cases that can be O(n). I wonder if that is true ?? If yes, is there an alternative to policy based data structures ?? Here is one solution
•  » » 7 months ago, # ^ |   0 It shouldn't be. And even if so, what's the deal? It will always be O(n log n +q log n) if you use set of numbers of size n and run q queries.
•  » » » 7 months ago, # ^ |   0 but this link line 4 says : > while having an amortized complexity of O(lgn) means that some (very few) operator calls can take O(n) time. and if that's the case, won't the complexity be q*n instead of qlog(n) ?? which I suspect might be the reason of my solution getting TLE using policy based data structure while the editorial using treap and getting accepted (having same time complexity ). Please guide me through it as I use this data structure very frequently.
•  » » » » 7 months ago, # ^ |   0 It can't be. By definition amortized complexity means that algorithm is guaranteed to have such executing time if it's divided By the numbers of queries. When they say "few" they mean it
•  » » » » » 7 months ago, # ^ |   0 So, I should treat it as the worst time complexity of this data structure ?
•  » » » » » » 7 months ago, # ^ |   +1 If you don't revert operations and don't need it persistent then basically yes. In your case it is likely to be too large constant factor. But I'll look into it later.
•  » » » » » » » 7 months ago, # ^ | ← Rev. 2 →   0 Thanks a lot for taking this into consideration.
 » 6 weeks ago, # |   0 Thanks,it just got me an AC.