Hi everyone! After a relatively long lull, I decided that ~~my contribution growing too slowly~~ the hour has come to please you with another article in the blog *:)*

2 months ago user Perlik wrote an article, in which he described a very interesting STL implemented data structure that allows you to quickly perform various operations with substrings. Some time after I tested it on various tasks and, unfortunately, tend to get a negative result — rope was too slow, especially when it came to working with individual elements.

For some time, I forgot about that article. Increasingly, however, I was faced with problems in which it was necessary to implement set with the ability to know ordinal number of item and also to get item by its ordinal number (ie, order statistic in the set). And then I remembered that in the comments to that article, someone mentioned about the mysterious data structure order statistics tree, which supports these two operations and which is implemented in STL (unfortunately only for the GNU C++). And here begins my fascinating acquaintance with policy based data structures, and I want to tell you about them *:)*

Let's get started. In this article I will talk about IMO the most interesting of the implemented structures — tree. We need to include the following headers:

```
#include <ext/pb_ds/assoc_container.hpp> // Common file
#include <ext/pb_ds/tree_policy.hpp> // Including tree_order_statistics_node_update
```

After closer inspection you may find that the last two files contained in the library

```
#include <ext/pb_ds/detail/standard_policies.hpp>
```

Namespace, which we will have to work in newer versions of C++ is called `__gnu_pbds;`

, earlier it was called `pb_ds;`

Now let's look at the concrete structure.

The tree-based container has the following declaration:

```
template<
typename Key, // Key type
typename Mapped, // Mapped-policy
typename Cmp_Fn = std::less<Key>, // Key comparison functor
typename Tag = rb_tree_tag, // Specifies which underlying data structure to use
template<
typename Const_Node_Iterator,
typename Node_Iterator,
typename Cmp_Fn_,
typename Allocator_>
class Node_Update = null_node_update, // A policy for updating node invariants
typename Allocator = std::allocator<char> > // An allocator type
class tree;
```

Experienced participants may have already noticed that if initialize the template only the first two types, we obtain almost exact copy of the container `map`

. Just say, that this container can be `set`

, for this you just need to specify the second argument template type as `null_type`

( in older versions it is `null_mapped_type`

).

By the way `Tag`

and `Node_Update`

are missing in `map`

. Let us examine them in more detail.

`Tag`

— class denoting a tree structure, which we will use. There are three base-classes provided in STL for this, it is `rb_tree_tag`

(red-black tree), `splay_tree_tag`

(splay tree) and `ov_tree_tag`

(ordered-vector tree). Sadly, at competitions we can use only red-black trees for this because splay tree and OV-tree using linear-timed split operation that prevents us to use them.

`Node_Update`

— class denoting policy for updating node invariants. By default it is set to `null_node_update`

, ie, additional information not stored in the vertices. In addition, C++ implemented an update policy `tree_order_statistics_node_update`

, which, in fact, carries the necessary operations. Consider them. Most likely, the best way to set the tree is as follows:

```
typedef tree<
int,
null_type,
less<int>,
rb_tree_tag,
tree_order_statistics_node_update>
ordered_set;
```

If we want to get map but not the set, as the second argument type must be used mapped type. Apparently, the tree supports the same operations as the `set`

(at least I haven't any problems with them before), but also there are two new features — it is `find_by_order()`

and `order_of_key()`

. The first returns an iterator to the k-th largest element (counting from zero), the second — the number of items in a set that are strictly smaller than our item. Example of use:

ordered_set X;
X.insert(1);
X.insert(2);
X.insert(4);
X.insert(8);
X.insert(16);
cout<<*X.find_by_order(1)<<endl; // 2
cout<<*X.find_by_order(2)<<endl; // 4
cout<<*X.find_by_order(4)<<endl; // 16
cout<<(end(X)==X.find_by_order(6))<<endl; // true
cout<<X.order_of_key(-5)<<endl; // 0
cout<<X.order_of_key(1)<<endl; // 0
cout<<X.order_of_key(3)<<endl; // 2
cout<<X.order_of_key(4)<<endl; // 2
cout<<X.order_of_key(400)<<endl; // 5

Finally I would like to say about the performance of order_statistics_tree in STL. For this, I provide the following table.

Solution\Problem | 1028 | 1090 | 1521 | 1439 |

order_statistics_tree, STL | 0.062 | 0.218 | 0.296 | 0.468 |

Segment tree | 0.031 | 0.078 | 0.171 | 0.078 0.859* |

Binary Indexed Tree | 0.031 | 0.062 | 0.062 | |

* The final task requires direct access to the nodes of the tree for the implementation of solutions for O (mlogn). Without it, the solution works in O (mlogn*logn).

As you can see from all this , order_statistics_tree relatively little behind handwritten structures, and at times ahead of them in execution time. At the same time the code size is reduced considerably. Hence we can conclude is that order_statistics_tree — it is good and it can be used in contests.

Besides tree, I also wanted to describe here trie. However , I was confused by some aspects of its implementation, greatly limiting its usefulness in programming olympiads, so I decided not to talk about it. If anyone want he is encouraged to try to learn more about this structure by himself.

Useful links:

— Documentation of pb_ds

— Testing of pb_ds

— Using of pb_ds

— Demonstration of order_statistics_tree

— Demonstration of trie with prefix search

— Operations with intervals with handwritten update policy class

— More examples from that site

P.S. Sorry for my poor English *:)*

Example of trie with search of prefix range.

Problem: 1414

Solution: http://ideone.com/6VFNZl

Is there a way of counting number of strings in the trie with a certain prefix without iterating through them all?

You augment the trie node to also contain a number. Update this number everytime you insert a string into the trie. To get the number of strings which share the prefix, Just traverse the prefix and output the num in the ending node.

Возможно, вам покажутся слегка нетривиальными решения деревом отрезков и деревом Фенвика, особенно, задач 1521 и 1439. Скорее всего, позже я также предоставлю статью, в которой опишу некоторые интересные способы использования этих структур, которые редко встречаются.

======================================================================================= You may be wondered about how I use segment tree and binary indexed tree in my solutions, especially for problems 1521 and 1439. Most likely, later I'll provide an entry about some interesting ways of using this structures, which are quite rare.

Here it is :)

This is really useful. Thanks a lot!

Very useful article! I need order-statistics on a multiset. How should I define the tree as?

As I know, there is no implemented tree multiset in STL. However you can use pair<T,int> as a key where the second element in pair is the time when item has been added.

Apparently, you can. Once I tried to write less_equal instead of less and it started to work as multiset, I even got AC using it in region olympiad)

I can't erase elements with less_equal comparator, e.g. this code output "1"

codeSo I guess it's not very useful thing (or I do something wrong).

UPD: I can delete iterator which I got with lower_bound. But it works incorrectly. This code erase 1, not 0

CodeWow. then it really sucks. Seems like I only used it with insert operations and strangely enough it worked)

Well, actually it works fine and exactly does what you want! The issue is that you're passing

`less_equal<int>`

as the tree comparator. Therefore it uses the same function for`lower_bound()`

. By definition of lower_bound function (according to cplusplus.com) it finds the first element not compared`true`

. Thus returns the first element greater than`val`

which is`1`

in your example.In order to make sure you may even test

`set<int,less_equal<int> >`

which results the same.But still the erase function does not work for elements like in normal multiset...

What if I want to calculate the index of upper_bound of a particular element? Suppose we have: 1 1 2 3 4 then how to find

index(upper_bound(2))?UPDATE: Maybe it is =

order_of_key(number+1)?I think you can also solve it by the number of elements strictly less than your number can b e found by

order_of_key. let suppose it is x; then to find the elements strictly greater than your number you can dototal size of set-xthat is upper bound array indexing start from 0;So to erase an element from ordered_set with less_equal I used lower_bound on (element to be erased — 1) and then erased the iterator I got from lower_bound and it works only if the element to be erased is present in the set.

Another drawback of using

`less_equal`

instead of`less`

is that`lower_bound`

works as`upper_bound`

and vice-versa. CodeAn excellent deduction.

What about the comparator i.e. less<int>

typedef tree<int, null_type, less_equal, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;

Can we use this in this question? or we can't use it, as I am not able to implement the multiset part

just use the fucking binary search on this problem

The 3rd template argument must be

`less_equal<int>`

. But adamant, is it the correct way to do this ? Since as far as I know, most of the STL containers require a comparator that offers a strict weak ordering (Not sure of the exact reasons though). So, will there be some drawbacks of trying to construct a multiset this way?can't we just use it as multiset with less_equal and then assume that lower_bound and upper_bound are exchanged. I mean that's not right to do but will get u AC i think :p

## Here's a quick way using Policy-Based Data Structures in C++:

There exists something called as an Ordered Set, which lets you insert/remove elements in O(log n) time (and pretty much all other functions that std::set has to offer). It also gives 2 more features:

find the Kth elementandfind the rank of the Xth element. The problem is that this doesn't allow duplicates :(No Worries though!We will map duplicates with a separate index/priority, and define a new structure (call it Ordered Multiset)! I've attached my implementation below for reference.Finally, every time you want to find the no of elements greater than say x, call the function upper_bound (No of elements less than or equal to x) and subtract this number from the size of your Ordered Multiset!

Note: PBDS use a lot of memory, so that is a constraint, I'd suggest using a Binary Search Tree or a Fenwick Tree.

CodeUsage:Test Case: Input :

`6 10 1 3 3 2`

Output :

`0 1 1 1 3`

Referencesmochow13's GitHub

## Time Complexity : O(log n) per insert/query

.(same comment as above)

what will be the complexity of erase operation? O(logn) or O(n)

O(logn)

Really??

Yes.

Is there any efficient function to merge 2 trees ?

You can do in log(n) if the greatest element of 1 tree is smaller than smallest of other. Otherwise, I don't you have a better option. Tell me as well if you have found something interesting.

How do you merge two non-intersected rbtrees (as in the article) in O(lg n) time? I find that the default join() function takes linear time...

Have you found something interesting about merge ? Im trying to do .join but it throws error.

Surely worth looking into :) Thanks

how can i use it like multiset ?

Main idea is to keep pairs like {

elem,id}.thanks a lot :)

How will find_by_order be used in this case?

like me.order_of_key({x, 0}) me.find_by_order({x,0}) dose not work.. why??

*me.find_by_order({x,0})

still it does not work.

sure it does work, but you cannot print a pair so you have to do it like this

`cout << me.find_by_order(1)->first ;`

wtf, find_by_order takes number

how to use find_by_order if I'm using ordered_set with pairs. ~~~~~ typedef tree< pair<int, int>, null_type, less<pair<int, int>>, rb_tree_tag, tree_order_statistics_node_update> ordered_set; ~~~~~

what is x here

nice technique. worked fine! thanks a lot.

typedef tree<int, null_type, less_equal, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;

Hi, adamant, the code files in Useful Links don't seem to work. Could you fix them?

Thanks for this great post. I am looking forward to your next and next next posts.

Can you elaborate please?

For example, the code in "Demonstration of trie with prefix search" cannot run on my computer. I saw that there was some old syntax like the namespace pb_ds. I changed it, then it returned a new error in another place. The truth is I am not good enough to change things any more. I hope that you can update it. (I know that I can use the trie code in one of your comments, but this post would be even better if the cost in Useful Links were also updated)

Thank you.

I can't edit the original files — they're not mine. But here is the correct version: http://ideone.com/BpZlYO

Thank you.

https://www.e-olymp.com/ru/problems/2961 Is it possible to solve this problem using this algorithm?

Can anybody share a Java equivalent class for this type of set or a code which acts according to above data structure?

It does not exist.

You may use instead:

I thought of number compression + fenwick tree, but this solution will work for only offline queries. I want to handle online queries. The best I can think of now is Treap + Segment Tree or Treap + Fenwick Tree. But here again is the problem of implementation of mixed data structure, I am unable to think how to implement that. Can you please help me?

Any idea on how to use this pre C++11?

How can I use a custom compare function in the "Key comparison functor" section for custom data types?

Just like for regular set..

I can use custom compare function for a set by using operator overloading. I want to know is there any other way to do this for both set and ordered set using lambda expression or just using a compare bool function?

Thank adamant you very much for your nice post.

I suppose you can overload operator and still use

`less<T>`

. Also you can use functors and lambdas in the way similar as for sets:Why does it only work with lambdas and not functions? On doing

Where

`comp`

is a comparator function, I get an error.How To Solve This Problems. Plz Help Me.

Is it possible to search for the order of a non-key value by passing a comparator/function ? I sometimes find myself have to use my own tree templates instead because of having to write a search function that can cope with this task.

What is the constant factor of order_statistics_tree though it executes in logarithmic complexity ? I think it's constant factor is very high.

could anyone write the exact thing to use this data structure as map..pls.I'm not able do so.

And what exactly do you want from it? You can use something like that.

as it was mentioned in the article that we can use it as map by defining mapped type .So I tried to do that by couldn't. that's all;)

is this thing a order statistics tree

Can you write the multiset implementation of ordered_set. When I use less_equal then I'm not able to erase from the ordered_set. And when I use the pair for including duplicates I'm not able to use find_by_order({x,0}).

What's wrong with find_by_order({x, 0})?

It gives an error. It says no known conversion. error: no matching function for call to '__gnu_pbds::tree<std::pair<int, int>, __gnu_pbds::null_type, std::less<std::pair<int, int> >, __gnu_pbds::rb_tree_tag, __gnu_pbds::tree_order_statistics_node_update>::find_by_order()

Wait, find_by_order takes number $$$k$$$ and returns $$$k$$$-th element. What exactly do you expect it to return?..

A number .

I think you should use order_of_key then

Thanks adamant I did that.

I have been searching for that for a long time.Can u please provide us with a multiset implementation. It would be really helpful if u provide us with erase() and find_by_order() functions. And I believe this would be useful to a lot of coders.

Actually find_by_order( x) takes in an integer input because it tells us the element present in the position x. Whereas find_by_order({x, 0}) is a syntax error and it wont work.

So basically it is not possible to find the K-th smallest element while we are using pair ?

can we use it as multiset?

u can use pair<int,int> for manage the duplicate values..

typedef tree<int, null_type, less_equal, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;

It is:

`typedef tree<int, null_type, less_equal<int>, rb_tree_tag, tree_order_statistics_node_update> indexed_multiset;`

thank you bro :) it worked for duplicate elements too..

what to do if i want to merge two ordered multisets?

How can I erase element from order_set by it's value?

just do

`ordered_set<T> st; st.erase(key);`

This doesn't work .

ordered_set :: iterator it; it=st.upper_bound(key); st.erase(it); it works;

Thanks bro

(Sorry for necroposting) Does anyone know how to compile with the pbds header files on Mac OS X ? I think the g++ command is configured to use clang by default, and so it is not directly available. I've tried adding ext/pb_ds into the include folder (the same way you would enable bits/stdc++.h) but instead new dependencies come up.

For me, I installed the latest version of gcc (gcc 9.3.0_1) and compiled with gcc-9. It works on Mojave and Catalina and it should work on High Sierra (but I haven't tested it).

To install gcc-9, I used brew and the command

`brew install gcc@9`

(Again, sorry for necroposting) There is an important step that I've so-far seen all the mac g++ setup instructions ignore/skip. I also installed gcc through brew, but neither bits/stdc++.h header nor pbds data structures seemed to work. Putting the bits/stdc++.h file manually in the /usr/local/include folder allowed me to at least solve the header problem, but trying the same method for pbds spawned a lot of dependency issues.

The problem was that brew sets up the g++ command as g++-10 so that it doesn't clash with the default g++ command mac provides. So, alias-ing g++ to g++-10 in the .bashrc/.zshrc file would be enough for solving the issue if you compile using the terminal. But if you compile using an editor and the editor directly uses the /usr/bin/g++ binary for compilation, then alias-ing g++ wouldn't work anymore. For example, most of the popular VSCode extensions for CP I've seen use /usr/bin/g++ to compile. I wasn't aware that this was the root of the issue and had been missing pbds structures for a long time. The way to solve it is to simlink g++ to g++-10 and prepend it to the PATH variable so that the simlink is prioritized before the default g++.

Instructions for the simlink way(Replace the version names with your g++ versions)

`cd /usr/local/Cellar/gcc/10.2.0_4/bin`

`ln -s ./c++-10 ./c++`

`ln -s ./g++-10 ./g++`

Finally, put in your .bashrc/.zshrc file:

`export PATH=/usr/local/Cellar/gcc/10.2.0_4/bin:$PATH`

I might even find it useful myself if I ever need to setup a mac again in future.

Note that:

`-D_GLIBCXX_DEBUG`

and`-fsanitize=undefined`

.Note on using

`less_equal`

as comparison function to use it as a multiset:`_GLIBCXX_DEBUG`

must not be defined, otherwise some internal check will fail.`find`

will always return`end`

.`lower_bound`

works like`upper_bound`

in normal set (to return the first element > it)`upper_bound`

works like`lower_bound`

in normal set (to return the first element >= it)`find_by_order`

and`order_of_key`

works properly (unlike the 2 functions above).Some code to verify the points above: Try it online!

Also erase doesnt work for erasing elements rather that by using iterator...

valuable information guys upvote it ? i dig the docs for getting this information

adamant, while discussion, someone suggested that the time complexity of using it as a set is amortized log(n), and this post says that it means that in some cases that can be O(n). I wonder if that is true ?? If yes, is there an alternative to policy based data structures ?? Here is one solution

It shouldn't be. And even if so, what's the deal? It will always be O(n log n +q log n) if you use set of numbers of size n and run q queries.

but this link line 4 says :

and if that's the case, won't the complexity be q*n instead of qlog(n) ?? which I suspect might be the reason of my solution getting TLE using policy based data structure while the editorial using treap and getting accepted (having same time complexity ).

Please guide me through it as I use this data structure very frequently.

It can't be. By definition amortized complexity means that algorithm is guaranteed to have such executing time if it's divided By the numbers of queries. When they say "few" they mean it

So, I should treat it as the worst time complexity of this data structure ?

If you don't revert operations and don't need it persistent then basically yes. In your case it is likely to be too large constant factor. But I'll look into it later.

Thanks a lot for taking this into consideration.

Thanks,it just got me an AC.

If someone is having trouble to use these in windows with mingw compiler, try to find hash_standard_resize_policy_imp.hpp0000644 in MinGW\lib\gcc\mingw32\6.3.0\include\c++\ext\pb_ds\detail\resize_policy and rename it to hash_standard_resize_policy_imp.hpp. I dont know why it is named like this.

Thanks buddy, I was trying to find this from long time!

thanks bro ...do you know why it name like this ... why wrong extension is given to it..

I have defined a

`bool cmp(pair<int,int> a, pair<int, int> b)`

for comparing pairs. Is it possible to use that as the comparator for the`ordered_set`

?Thank you this really helped me.

Is this actually STL? I only see files with gcc's implementation of the c++ standard library. Actual STL is quite old (the linked post references SGI's docs, and SGI doesn't even exist any more)

Nope, this has nothing to do with the C++ standard library.

This is a GNU extension, so it has nothing to do with the STL which (incorrectly) refers to the C++ standard library.

STL doesn't even refer to the C++ standard library. STL is the Standard Template Library from long ago which heavily influenced the C++ standard library but is not the same thing. https://stackoverflow.com/a/5205571

I know, but here and in many other places "STL" is incorrectly used to refer to the C++ standard library.

`The first returns an iterator to the k-th largest element (counting from zero)`

Shouldn't it be

`the k-th smallest element`

? adamantHow to merge two ordered sets?

Iterate through every element in the smaller set and append it to the bigger one

thanks brother

If I change less to less_equal then it will works as multiset , can I use count function on that.

When to use BIT over PBDS other then memory limit

https://codeforces.com/contest/459/problem/D can we solve this question using pbds ? Its giving me runtime error — https://codeforces.com/contest/459/submission/81689503

Yes we can! My Submission The main key to solving this is handling duplicates!

adamant can you kindly give me a hint about how to solve above mentioned timus 1028 problem using pbds?

can we use pair in place of int ??

Sure, why not. We may use any type with operator < defined.

Is

`insert`

not too slow? I tried 10^7 insertions and it took over a minute.