Hi everyone! After a relatively long lull, I decided that ~~my contribution growing too slowly~~ the hour has come to please you with another article in the blog *:)*

2 months ago user Perlik wrote an article, in which he described a very interesting STL implemented data structure that allows you to quickly perform various operations with substrings. Some time after I tested it on various tasks and, unfortunately, tend to get a negative result — rope was too slow, especially when it came to working with individual elements.

For some time, I forgot about that article. Increasingly, however, I was faced with problems in which it was necessary to implement set with the ability to know ordinal number of item and also to get item by its ordinal number (ie, order statistic in the set). And then I remembered that in the comments to that article, someone mentioned about the mysterious data structure order statistics tree, which supports these two operations and which is implemented in STL (unfortunately only for the GNU C++). And here begins my fascinating acquaintance with policy based data structures, and I want to tell you about them *:)*

Let's get started. In this article I will talk about IMO the most interesting of the implemented structures — tree. We need to include the following headers:

```
#include <ext/pb_ds/assoc_container.hpp> // Common file
#include <ext/pb_ds/tree_policy.hpp> // Including tree_order_statistics_node_update
```

After closer inspection you may find that the last two files contained in the library

```
#include <ext/pb_ds/detail/standard_policies.hpp>
```

Namespace, which we will have to work in newer versions of C++ is called `__gnu_pbds;`

, earlier it was called `pb_ds;`

Now let's look at the concrete structure.

The tree-based container has the following declaration:

```
template<
typename Key, // Key type
typename Mapped, // Mapped-policy
typename Cmp_Fn = std::less<Key>, // Key comparison functor
typename Tag = rb_tree_tag, // Specifies which underlying data structure to use
template<
typename Const_Node_Iterator,
typename Node_Iterator,
typename Cmp_Fn_,
typename Allocator_>
class Node_Update = null_node_update, // A policy for updating node invariants
typename Allocator = std::allocator<char> > // An allocator type
class tree;
```

Experienced participants may have already noticed that if initialize the template only the first two types, we obtain almost exact copy of the container `map`

. Just say, that this container can be `set`

, for this you just need to specify the second argument template type as `null_type`

( in older versions it is `null_mapped_type`

).

By the way `Tag`

and `Node_Update`

are missing in `map`

. Let us examine them in more detail.

`Tag`

— class denoting a tree structure, which we will use. There are three base-classes provided in STL for this, it is `rb_tree_tag`

(red-black tree), `splay_tree_tag`

(splay tree) and `ov_tree_tag`

(ordered-vector tree). Sadly, at competitions we can use only red-black trees for this because splay tree and OV-tree using linear-timed split operation that prevents us to use them.

`Node_Update`

— class denoting policy for updating node invariants. By default it is set to `null_node_update`

, ie, additional information not stored in the vertices. In addition, C++ implemented an update policy `tree_order_statistics_node_update`

, which, in fact, carries the necessary operations. Consider them. Most likely, the best way to set the tree is as follows:

```
typedef tree<
int,
null_type,
less<int>,
rb_tree_tag,
tree_order_statistics_node_update>
ordered_set;
```

If we want to get map but not the set, as the second argument type must be used mapped type. Apparently, the tree supports the same operations as the `set`

(at least I haven't any problems with them before), but also there are two new features — it is `find_by_order()`

and `order_of_key()`

. The first returns an iterator to the k-th largest element (counting from zero), the second — the number of items in a set that are strictly smaller than our item. Example of use:

ordered_set X;
X.insert(1);
X.insert(2);
X.insert(4);
X.insert(8);
X.insert(16);
cout<<*X.find_by_order(1)<<endl; // 2
cout<<*X.find_by_order(2)<<endl; // 4
cout<<*X.find_by_order(4)<<endl; // 16
cout<<(end(X)==X.find_by_order(6))<<endl; // true
cout<<X.order_of_key(-5)<<endl; // 0
cout<<X.order_of_key(1)<<endl; // 0
cout<<X.order_of_key(3)<<endl; // 2
cout<<X.order_of_key(4)<<endl; // 2
cout<<X.order_of_key(400)<<endl; // 5

Finally I would like to say about the performance of order_statistics_tree in STL. For this, I provide the following table.

Solution\Problem | 1028 | 1090 | 1521 | 1439 |

order_statistics_tree, STL | 0.062 | 0.218 | 0.296 | 0.468 |

Segment tree | 0.031 | 0.078 | 0.171 | 0.078 0.859* |

Binary Indexed Tree | 0.031 | 0.062 | 0.062 | |

* The final task requires direct access to the nodes of the tree for the implementation of solutions for O (mlogn). Without it, the solution works in O (mlogn*logn).

As you can see from all this , order_statistics_tree relatively little behind handwritten structures, and at times ahead of them in execution time. At the same time the code size is reduced considerably. Hence we can conclude is that order_statistics_tree — it is good and it can be used in contests.

Besides tree, I also wanted to describe here trie. However , I was confused by some aspects of its implementation, greatly limiting its usefulness in programming olympiads, so I decided not to talk about it. If anyone want he is encouraged to try to learn more about this structure by himself.

Useful links:

— Documentation of pb_ds

— Testing of pb_ds

— Using of pb_ds

— Demonstration of order_statistics_tree

— Demonstration of trie with prefix search

— Operations with intervals with handwritten update policy class

— More examples from that site

P.S. Sorry for my poor English *:)*

Example of trie with search of prefix range.

Problem: 1414

Solution: http://ideone.com/6VFNZl

Is there a way of counting number of strings in the trie with a certain prefix without iterating through them all?

Возможно, вам покажутся слегка нетривиальными решения деревом отрезков и деревом Фенвика, особенно, задач 1521 и 1439. Скорее всего, позже я также предоставлю статью, в которой опишу некоторые интересные способы использования этих структур, которые редко встречаются.

======================================================================================= You may be wondered about how I use segment tree and binary indexed tree in my solutions, especially for problems 1521 and 1439. Most likely, later I'll provide an entry about some interesting ways of using this structures, which are quite rare.

Here it is :)

This is really useful. Thanks a lot!

Very useful article! I need order-statistics on a multiset. How should I define the tree as?

As I know, there is no implemented tree multiset in STL. However you can use pair<T,int> as a key where the second element in pair is the time when item has been added.

Apparently, you can. Once I tried to write less_equal instead of less and it started to work as multiset, I even got AC using it in region olympiad)

I can't erase elements with less_equal comparator, e.g. this code output "1"

codeSo I guess it's not very useful thing (or I do something wrong).

UPD: I can delete iterator which I got with lower_bound. But it works incorrectly. This code erase 1, not 0

CodeWow. then it really sucks. Seems like I only used it with insert operations and strangely enough it worked)

Well, actually it works fine and exactly does what you want! The issue is that you're passing

`less_equal<int>`

as the tree comparator. Therefore it uses the same function for`lower_bound()`

. By definition of lower_bound function (according to cplusplus.com) it finds the first element not compared`true`

. Thus returns the first element greater than`val`

which is`1`

in your example.In order to make sure you may even test

`set<int,less_equal<int> >`

which results the same.What if I want to calculate the index of upper_bound of a particular element? Suppose we have: 1 1 2 3 4 then how to find

index(upper_bound(2))?UPDATE: Maybe it is =

order_of_key(number+1)?Another drawback of using

`less_equal`

instead of`less`

is that`lower_bound`

works as`upper_bound`

and vice-versa. CodeWhat about the comparator i.e. less<int>

typedef tree<int, null_type, less_equal, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;

Can we use this in this question? or we can't use it, as I am not able to implement the multiset part

just use the fucking binary search on this problem

The 3rd template argument must be

`less_equal<int>`

. But adamant, is it the correct way to do this ? Since as far as I know, most of the STL containers require a comparator that offers a strict weak ordering (Not sure of the exact reasons though). So, will there be some drawbacks of trying to construct a multiset this way?can't we just use it as multiset with less_equal and then assume that lower_bound and upper_bound are exchanged. I mean that's not right to do but will get u AC i think :p

.(same comment as above)

what will be the complexity of erase operation? O(logn) or O(n)

O(logn)

Really??

Yes.

Is there any efficient function to merge 2 trees ?

You can do in log(n) if the greatest element of 1 tree is smaller than smallest of other. Otherwise, I don't you have a better option. Tell me as well if you have found something interesting.

How do you merge two non-intersected rbtrees (as in the article) in O(lg n) time? I find that the default join() function takes linear time...

Have you found something interesting about merge ? Im trying to do .join but it throws error.

Surely worth looking into :) Thanks

how can i use it like multiset ?

Main idea is to keep pairs like {

elem,id}.thanks a lot :)

How will find_by_order be used in this case?

like me.order_of_key({x, 0}) me.find_by_order({x,0}) dose not work.. why??

*me.find_by_order({x,0})

still it does not work.

sure it does work, but you cannot print a pair so you have to do it like this

`cout << me.find_by_order(1)->first ;`

wtf, find_by_order takes number

how to use find_by_order if I'm using ordered_set with pairs. ~~~~~ typedef tree< pair<int, int>, null_type, less<pair<int, int>>, rb_tree_tag, tree_order_statistics_node_update> ordered_set; ~~~~~

what is x here

typedef tree<int, null_type, less_equal, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;

Hi, adamant, the code files in Useful Links don't seem to work. Could you fix them?

Thanks for this great post. I am looking forward to your next and next next posts.

Can you elaborate please?

For example, the code in "Demonstration of trie with prefix search" cannot run on my computer. I saw that there was some old syntax like the namespace pb_ds. I changed it, then it returned a new error in another place. The truth is I am not good enough to change things any more. I hope that you can update it. (I know that I can use the trie code in one of your comments, but this post would be even better if the cost in Useful Links were also updated)

Thank you.

I can't edit the original files — they're not mine. But here is the correct version: http://ideone.com/BpZlYO

Thank you.

https://www.e-olymp.com/ru/problems/2961 Is it possible to solve this problem using this algorithm?

Can anybody share a Java equivalent class for this type of set or a code which acts according to above data structure?

It does not exist.

You may use instead:

I thought of number compression + fenwick tree, but this solution will work for only offline queries. I want to handle online queries. The best I can think of now is Treap + Segment Tree or Treap + Fenwick Tree. But here again is the problem of implementation of mixed data structure, I am unable to think how to implement that. Can you please help me?

Any idea on how to use this pre C++11?

How can I use a custom compare function in the "Key comparison functor" section for custom data types?

Just like for regular set..

I can use custom compare function for a set by using operator overloading. I want to know is there any other way to do this for both set and ordered set using lambda expression or just using a compare bool function?

Thank adamant you very much for your nice post.

I suppose you can overload operator and still use

`less<T>`

. Also you can use functors and lambdas in the way similar as for sets:Why does it only work with lambdas and not functions? On doing

Where

`comp`

is a comparator function, I get an error.How To Solve This Problems. Plz Help Me.

Is it possible to search for the order of a non-key value by passing a comparator/function ? I sometimes find myself have to use my own tree templates instead because of having to write a search function that can cope with this task.

What is the constant factor of order_statistics_tree though it executes in logarithmic complexity ? I think it's constant factor is very high.

could anyone write the exact thing to use this data structure as map..pls.I'm not able do so.

And what exactly do you want from it? You can use something like that.

as it was mentioned in the article that we can use it as map by defining mapped type .So I tried to do that by couldn't. that's all;)

is this thing a order statistics tree

Can you write the multiset implementation of ordered_set. When I use less_equal then I'm not able to erase from the ordered_set. And when I use the pair for including duplicates I'm not able to use find_by_order({x,0}).

What's wrong with find_by_order({x, 0})?

It gives an error. It says no known conversion. error: no matching function for call to '__gnu_pbds::tree<std::pair<int, int>, __gnu_pbds::null_type, std::less<std::pair<int, int> >, __gnu_pbds::rb_tree_tag, __gnu_pbds::tree_order_statistics_node_update>::find_by_order()

Wait, find_by_order takes number $$$k$$$ and returns $$$k$$$-th element. What exactly do you expect it to return?..

A number .

I think you should use order_of_key then

Thanks adamant I did that.

I have been searching for that for a long time.Can u please provide us with a multiset implementation. It would be really helpful if u provide us with erase() and find_by_order() functions. And I believe this would be useful to a lot of coders.

Actually find_by_order( x) takes in an integer input because it tells us the element present in the position x. Whereas find_by_order({x, 0}) is a syntax error and it wont work.

So basically it is not possible to find the K-th smallest element while we are using pair ?

can we use it as multiset?

u can use pair<int,int> for manage the duplicate values..

typedef tree<int, null_type, less_equal, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;

It is:

`typedef tree<int, null_type, less_equal<int>, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;`

How can I erase element from order_set by it's value?

just do

`ordered_set<T> st; st.erase(key);`

This doesn't work .

ordered_set :: iterator it; it=st.upper_bound(key); st.erase(it); it works;

Thanks bro

(Sorry for necroposting) Does anyone know how to compile with the pbds header files on Mac OS X ? I think the g++ command is configured to use clang by default, and so it is not directly available. I've tried adding ext/pb_ds into the include folder (the same way you would enable bits/stdc++.h) but instead new dependencies come up.

Note that:

`-D_GLIBCXX_DEBUG`

and`-fsanitize=undefined`

.Note on using

`less_equal`

as comparison function to use it as a multiset:`_GLIBCXX_DEBUG`

must not be defined, otherwise some internal check will fail.`find`

will always return`end`

.`lower_bound`

works like`upper_bound`

in normal set (to return the first element > it)`upper_bound`

works like`lower_bound`

in normal set (to return the first element >= it)`find_by_order`

and`order_of_key`

works properly (unlike the 2 functions above).Some code to verify the points above: Try it online!

adamant, while discussion, someone suggested that the time complexity of using it as a set is amortized log(n), and this post says that it means that in some cases that can be O(n). I wonder if that is true ?? If yes, is there an alternative to policy based data structures ?? Here is one solution

It shouldn't be. And even if so, what's the deal? It will always be O(n log n +q log n) if you use set of numbers of size n and run q queries.

but this link line 4 says :

and if that's the case, won't the complexity be q*n instead of qlog(n) ?? which I suspect might be the reason of my solution getting TLE using policy based data structure while the editorial using treap and getting accepted (having same time complexity ).

Please guide me through it as I use this data structure very frequently.

It can't be. By definition amortized complexity means that algorithm is guaranteed to have such executing time if it's divided By the numbers of queries. When they say "few" they mean it

So, I should treat it as the worst time complexity of this data structure ?

If you don't revert operations and don't need it persistent then basically yes. In your case it is likely to be too large constant factor. But I'll look into it later.

Thanks a lot for taking this into consideration.

Thanks,it just got me an AC.