### errorgorn's blog

By errorgorn, 5 weeks ago,

Selamat pagi Codeforces!

We are excited to invite you to TLX Regular Open Contest #30!

Key details:

Many thanks to:

Please register for the contest or else rama_pang will drink all your milk 😋🥛.

UPD: the contest is over!

Congratulations to the top 5:

1. Benq (perfect score)
2. maroonrk
3. hos.lyric
4. hitonanode
5. nuip

Congratulations to our first solvers:

Also, congratulations to Phantom_Performer for getting the 400000-th submission!

You can upsolve the problem here. Editorial is available in the upsolve link!

Thank you for participating and see you on the next contest!

• +200

By errorgorn, 5 weeks ago,

Now that Goodbye 2022 has recently concluded (I hope you liked it). Let's share the best problems from 2022 instead of only before 3 weeks ago (I am looking at you bIeahbIeah).

Btw, I have noticed a trend in these blog posts where many problems come from the last few months of the year. So I hope that we take a good look through the contests of 2022 before posting.

Also, I think it would be a good idea to share your favourite blogs of 2022 as well in case we missed out some genius blog!

• +120

By errorgorn, history, 6 months ago,

I have created a facebook account to compete in Meta Hacker Cup (I did not compete in previous editions for some reason). However, it has been suspended 2 times recently. I did not post anything in account and therefore I don't think there is any reason to suspend my account? I even uploaded a picture of my face to facebook when they wanted evidence that the account belongs to an actual person so I don't understand the reason for the second suspension. Also, this is my second facebook account because I did not appeal when my first one got suspended (again, I did not do anything on facebook and also did not see the email that my account got suspended).

Here is screenshot of my email

Is anyone else facing similar issues?

SecondThread is there any way that I can compete without a facebook account? It is really annoying to appeal to these false suspensions.

• +143

By errorgorn, 6 months ago,

Note: I am not sponsored by the JOISC committee, if the committee is not ok with prizes, I will remove them.

I have managed to solve this problem with $n=45000$. As far as I know, the person with the highest $n$ other than me is amiya who has achieved $n=44000$ in this comment.

I think this problem is quite interesting to constant optimize, so I am offering prizes for people who can improve the solution. The prizes work as follow:

• For every increase of $\min(\lfloor \frac{n}{100} \rfloor,460)$ by $1$, I will give the person 5USD (sorry, I'm still a broke student).
• You must link a submission of your code on oj.uz that is AC.
• You must state the value of $n$, rounded down to nearest $100$, that your code works in under $1$ million queries.
• As a bonus, you can explain what your optimizations are (well, I would like to know how you were able to optimize this problem).
• If this limits of $n$ turns out to be too hard, I might make the conditions more lenient.

I'll start first.

This is my solution: https://oj.uz/submission/619999. It can barely solve $n=45000$. The maximum number of queries used in $10$ random tests is $999920$.

These are my optimizations
Here is the test case generator, if you want

Good luck!

UPD: The maximum value of $n$ has been capped to $46000$, I think $50000$ was probably too unrealistic of a goal.

• +219

By errorgorn, 7 months ago,

This blog was going to be translation of $[1]$ (which is in Chinese) but I think I ended up deviating too much from the source material that it would not be appropriate to call this a translation. Anyways, I am honestly amazed by how ahead of its time this paper is. In its abstract, it starts by saying that the state of art for tree path queries is $O(n \log n + q \sqrt n \log n)$, which I honestly can't fathom how one would arrive, then proceeds to claim that it has found an $O((n+q) \log n)$ algorithm. All the more, in 2007.

While I was in the process of writing this blog, smax sniped me and posted $[9]$. However, we will focus on static trees only.

Thanks to oolimry, Everule and timreizin for proofreading.

Note to reader: I am expecting the reader to be familiar with what a splay tree and HLD is. You do not really need to know why splay trees are $O(\log n)$ amortised but you should know what the zig, zig-zig and zig-zag operations are. Also, when I say some algorithm has some complexity, I usually mean that it has that complexity amortised. You can probably tell from context which ones are amortised complexity. Also, there is basically $0$ reason to use the techniques described in this blog in practice (except maybe in interactive problems?), the constant factor is too high. You can scroll to the bottom to see the benchmarks. Then again, my implementation of normal HLD uses a really fast segment tree and my implementation $O((n+q) \log n)$ algorithm might have a really big constant factor.

## Splay Trees

First, we want to talk about splay trees $[2]$, specifically why splay trees have amortised $O(\log n)$ complexity for splaying some vertex $v$.

Define:

• $s(v)$ as the size of the subtree of $v$
• $r(v)=\log s(v)$, the rank function

There are $3$ types of operations we can perform on a vertex when we are splaying it to the root — zig, zig-zig, zig-zag. Let $r$ and $r'$ be the rank functions before and after we perform the operation. The main results we want to show is that:

• zig has a time complexity of $O(r'(x)-r(x)+1)$ amortised
• zig-zig and zig-zag has a time complexity of $O(r'(x)-r(x))$ amortised

I will not prove these as you can find the proof in $[2]$. But the point is that through some black magic, the only operation that has a cost with a constant term is a zig operation, which we only perform once. Somehow, we manage to remove all constant terms from our zig-zig and zig-zag operations.

Let $r_i$ be the rank function after performing the $i$-th operation. Suppose that we have performed $k$ operations on to splay some vertex, then the total time complexity is $O((r_{k}(x)-r_{k-1}(x))+(r_{k-1}(x)-r_{k-2}(x))+\ldots+(r_{1}(x)-r_{0}(x))+1)=O(r_k(x)-r_0(x)+1)$. The black magic from removing all constant terms from our zig-zig and zig-zag operation is that no matter how many zig-zig and zig-zag operations we perform, they don't have any effect on the final complexity since we cancel out all the terms in a telescoping sum manner (which also explains why performing zigs only would make the complexity blow up, as we should expect). For the case of splay trees, $r_k(x)=\log n$, since $x$ becomes the root of the tree, so we obtain the complexity of $O(\log n)$.

Now, let's discuss why link cut trees $[3]$ have a complexity of $O(\log n)$. Here is an image of link cut trees from $[4]$.

We split the tree into disjoint chains. Internally, each chain is a splay tree storing elements in the order of depth (lowest depth to the left and highest depth to the right). This way, when we splay $x$ in its splay tree, the vertices to its left are those with lower depth and the vertices to its right are those with higher depth. $2$ disjoint chains which are connected by an edge in the original tree would be connected by a path-parent edge in the new tree. If chains $a$ and $b$ were connected, with $a$ being the parent of $b$, then there would be a path-parent edge from the root of the splay tree containing $b$ to $a$. That is for each splay tree, only its root will have a path-parent edge. In the image, the tree on the left is the initial tree which is stored internally by the structure of splay trees on the right. It is good to mention now that we can actually view these disjoint splay trees as a single giant splay tree by not caring about which edge types join the vertices internally (and not caring about the fact that vertices can only have at most 2 children).

An important operation on link-cut trees is the $access(x)$ operation, which given some $x$ that we have chosen, changes the chains in such a way that the chain containing the root of the original tree also contains the vertex $x$. This is done by repeatedly ascending into the highest vertex in each chain, and switching which child is connected to its parent. So the cost of this operation is related to the number of distinct chains on the path from $x$ to the root of the whole tree. Also, we will shorten the chain of $x$ to have the deepest vertex be $x$ for reasons that will be apparent later. In the above image, we have performed $access(l)$ to get the state being the middle tree.

In pseudo-code, the access function would look something like this:

1. access(x):
2.     splay(x)
3.     if (right(x)!=NULL):                  //disconnect the current child
4.         path_parent(right(x))=x
5.         right(x)=NULL
6.     while (path_parent(x)!=NULL):        //it is now the root
7.         par=path_parent(x)
8.         splay(par)
9.         if (right(par)!=NULL):            //change both child chain's path_parent
10.            path_parent(right(par))=par
11.        path_parent(x)=NULL
12.        right(par)=x                      //change child chain to new one
13.        x=par

### $O(\log^2 n)$ Time Complexity Bound

First, let us show that the time complexity of access is bounded by $O(\log^2 n)$ We will show this by showing that the while loop (lines 6-13) will only loop $O(\log n)$ times. We will use ideas from HLD for this. Label each edge as either light or heavy. The idea is that it is rare that light edges will be changed to be in a chain.

Let the potential be the number of heavy edges that are not in a chain. Here is how the potential changes for an operation in our access function.

• If we change the edge in the chain to a heavy edge, our potential decreases by $1$.
• If we change the edge in the chain to a light edge, our potential increases by at most $1$ (at most because it could be the case that the edge was in another light edge or does not exist).

The number of times we change an edge in the change to a light edge is $O(\log n)$. The potential can only increase by at most $O(\log n)$ in each access, so the number of times the while look is run is $O(\log n)$. Each time the while loop is run, the most time-consuming part is line 8, which we have show earlier to have an time complexity of $O(\log n)$, so the entire access function has a complexity of $O(\log^2 n)$.

### $O(\log n)$ Time Complexity Bound

Here, we want to show that over all time lines 6-13 is called, the total time complexity is bounded by $O(\log n)$. Let's stop thinking about the splay trees as individual splay trees but imagine the entire structure is a giant splay tree instead. Recall our rank function from earlier. Since we want to regard the entire structure as a giant splay tree, $s(x)$ does not only refer to the size of the subtree when only considering the splay tree describing the chain that contains $x$, but all nodes that are in the subtree of $x$ when considering path-parent edges too.

We have established earlier that if we perform a splay operation on $x$, we have the time complexity of $O(r_{final}(x)-r_{initial}(x)+1)$. Suppose our while loop runs for $k$ times and on the $i$-th time the variable par is vertex $par^k$ (for convenience, we let $par^0=x$). Our time complexity would look something like $O(r_{final}(par^k)-r_{initial}(par^k)+1+\ldots+r_{final}(par^1)-r_{initial}(par^1)+1+r_{final}(par^0)-r_{initial}(par^0)+1)$. Notice that $r_{initial}(par^{x+1}) \geq r_{final}(par^x)$, because when we begin splaying $par^{x+1}$, the final position of $par^x$ would have been in the subtree of $par^{x+1}$, so $par^{x+1}$ obviously has a bigger subtree size than $par^x$. Therefore, we are able to cancel quite a few terms from the time complexity giving us a time complexity of $O(r_{final}(par^k)-r_{initial}(par^0)+k)$. We know that $r_{final}(par^k)=\log n$ since $par^k$ becomes the root of the tree and from the previous analysis on the $O(\log^2 n)$ time complexity bound, $k = \log n$. Therefore, $O(r_{final}(par^k)-r_{initial}(par^0)+k)=O(\log n)$.

## Tree Path Queries

### HLD + Segment Tree

This the solution that I think everyone and their mother knows. Perform the HLD on the tree and make a segment tree on each heavy chain. But there are a few things I want to talk about here.

#### Worst Case for HLD

There are $2$ styles of implementing this "segment tree on each heavy chain" part. One way, which is more popular because it is easy to code, is to have a single segment tree that stores all heavy chains. Another way is to construct a new segment tree for each heavy chain.

Now, let's think about how we can actually get HLD to its worst case of $O(n+q \log^2 n)$ for both implementations. For the case of balanced binary trees, can easily figure that the balanced binary tree easily forces the implementation of having a single segment tree to go into its worse case of $O(n+q\log^2 n)$, but for the implementation where we have a new segment tree for each heavy path, it only forces it to have $O(n+q \log n)$ complexity. To force the worst case $O(n+q \log^2 n)$ complexity, we will have to do modification to the binary tree. The problem with the balanced binary tree is that we do not make our segment tree use $O(\log n)$ time since the chains are so short. Ok, so let's make the chains long instead. Specifically, create a balanced binary tree of $\sqrt n$ vertices, then replace each vertex with a chain of size $\sqrt n$.

It seems that our complexity would be something like $O(n+q \log(\sqrt n) \log(\sqrt n))$, which has quite low constant compared to the implementation using a single segment tree on a normal balanced binary tree since $\log^2(\sqrt n) = \frac{1}{4} \log^2(n)$. Is there a tree that can make the constant higher?

#### Really Fast Segment Trees

When looking for fast segment tree implementations, most people would probably direct you to the AtCoder Library $[5]$ where they have implemented a fast lazy segment tree template with which maintains an array with elements in the monoid $S$ and is able to apply operation $F$ acting on $S \to S$ on an interval in the array. Although their code is very fast, we can speed it up by assuming that the functions are commutative. That is for $f,g \in F, x \in S, f(g(x))=g(f(x))$, which is usually true for the uses of segment trees in competitive programming.

There is actually a way to handle lazy tags in a way that does not change implementation of iterative segment tree too much. The idea is pretty similar to what is mentioned in $[6]$. We do not propagate lazy tags. Instead, the true value of a node in the segment tree is $val_u + \sum\limits_{v \text{ is ancestor of } u} lazy_v$ and we apply the lazy tags in a bottom-up fashion when doing queries.

Below is code for segment tree that handles range increment updates and range max queries.

Code

Let's just maintain a link-cut tree with some modifications to the underlying splay tree. Specifically, we additionally store a value which is the composition of values in the subtree of the splay tree (we only consider those vertices in the same chain). To perform a query on $u$ to $v$ (WLOG $u$ has lower depth than $v$), perform $access(u)$ then $access(v)$. Let $w=lca(u,v)$. It is possible that $w=u$. The below image shows the state of the tree after performing $access(u)$ and $access(v)$ respectively.

After performing both accesses, we would have $w$ being the root of the entire splay tree since we have accessed $u$ before $v$, $w$ would have already been in the same chain as the root before we start to access $v$. At the same time, we must splay $w$ when we are accessing $v$. So $w$ would be the last thing we splay, making it the root of the entire splay tree. Now, it is actually easy to perform path queries from $u$ to $v$. If $u=w$, then we simply query $w$ and the right child of $w$ (which is the path to $v$). If $u \neq w$, then we have to additionally include the path from $u$ to $w$ (but not including $w$). Luckily, due to the way we have accessed the vertices, this path would be exactly the chain containing $u$.

### HLD + Splay Tree

Let's replace the segment tree in the HLD solution with a splay tree with the same modification to the underlying splay tree as the earlier section. If we need to query the prefix of a splay tree, just splay the vertex then additionally query the left child. If we need to query the subarray of $[l,r]$ on the splay tree, splay $l$ to the root and $r$ to just below root, then we only have to additionally query the right child of $r$. Remember for a HLD query, we do some prefix queries and at most $1$ subarray query.

What would be the complexity? It seems like it would be the same $O(n + q \log^2 n)$. But no, it can be shown that it is actually $O(n+q\log n)$. Yes, it is not a typo. I still don't believe it myself even though the justification is below.

In the same way we showed that access works in $O(\log n)$ in link-cut trees, we will do the same here by imagining that there are fake edges from the root of each chain of the splay tree to its the parent of the chain so when we define the rank function and count the number of vertices in a subtree of $u$, we also take into account those vertices connected via fake edges. It is not too hard to see that the time complexity for the path query between $a$ and $b$ would be a similar telescoping sum resulting in $O(r_{final}(a)+r_{final}(b)-r_{initial}(a)-r_{initial}(b)+k_a+k_b)=O(\log n)$.

### Balanced HLD

Although the previous $2$ algorithms have $O(n+q\log n)$ complexity. They are extremely unpractical as splay trees (or any dynamic tree) carry a super huge constant. Is it possible to transfer the essence of what splay tree is doing into a static data structure?

What was so special about our HLD+splay tree solution that it is able to cut one log when compared to HLD+any other data structure? It's splaying! That is if a vertex was recently accessed, it would be pushed near to the root the tree even though it may have many light edges on its path to the root. This property of caring about the entire tree structure is unique to splay trees and isn't found in any other method mentioned here as other data structures treat each of the heavy chains independently.

So, we need to create a data structure that is similar to HLD+segment tree but instead of having a structure based on dividing based on the sum of weights of unweighted nodes (which is a segment tree), maybe let's give each node a weight and split the based on those weight. Wait, isn't that centroid decomposition?

Indeed, let us first do HLD then take the heavy chain containing the root of the tree. For each vertex in this heavy chain, give each vertex a weight which is the number of vertices in its connected components after removing all edges in the heavy chain. And that's pretty much the entire idea. Divide things based on the weighted case. More specifically, given a heavy chain with total weight $W$, choose a vertex $x$ such that the total weight of the left and rights sides of the chain (excluding $x$) have weight $\leq \frac{W}{2}$. Such a vertex always exists (it's the centroid). We set $x$ as the root of the binary tree and recurse on the left and right childs. For the subtrees that are children of the heavy chain, repeat this process and we have constructed our desired tree.

Now, we need to show that queries here are indeed $O(\log n)$. First we need to think about how we actually perform queries in our HLD structure. We know from HLD+segment tree our query loop is for querying the path between $a$ and $b$ is this:

• if $a$ and $b$ are in the same heavy chain, query $in[a]$ to $in[b]$
• if $a$ is deeper than $b$, query $in[hpar[a]]$ to $in[a]$
• if $b$ is deeper than $a$, query $in[hpar[b]]$ to $in[b]$

Of these $3$ queries, only the first query type is a sub-array query on the heavy chain, the rest of them are queries on prefixes of the heavy chain. Furthermore, the first query type is only done once. Now, what is the time complexity for a prefix query on a binary tree? It may be tempting to say it is "just $O(\log n)$ duh", but we can improve it.

For example, if we want to query the values on the prefix where the last vertex is the one labelled $x$, we simply perform a walk from vertex $x$ to the root of the tree and add the costs of those vertices and their left children where appropriate (if we have walked up from the left child, don't add stuff). Walking from vertex $x$ to the root of the tree takes $O(d_x-d_{root}+1)$ time. For the case of sub-array queries on $[x,y]$ we can see that it is walking from $x$ and $y$ respectively to the root of the tree which will take $O(d_x+d_y-d_{root}-d_{root}+1+1)$.

Let's change the definition of $d_a$ slightly to be the depth when the consider the entire structure of our tree (so we consider light edges too when calculating the depth). Let $x_{root}$ be the root of the heavy chain containing vertex $x$. The time complexity for prefix or suffix queries becomes $O(d_x-d_{x_{root}}+1)$ and for sub-array queries it becomes $O(d_x+d_y-d_{x_{root}}-d_{y_{root}}+2)$. Then we can see that the time complexity is some telescoping sum here too, since when we traverse a light edge, the depth of the would decrease. Actually we don't need the telescoping sum justification here as we can just prove it by saying the querying the simple path from $x$ to $y$ only requires us to move $x$ and $y$ upwards (and never downwards). In the end, the time complexity only depends on the depth of the endpoints of our queries. So, the only thing we need to prove is that the depth of the entire tree is bounded by $O(\log n)$.

But the proof of that is exactly the proof that centroid decomposition gives us a centroid tree of depth at most $O(\log n)$. Maybe I should elaborate more. Let's consider the new tree structure we have built. There are $2$ types of edges, heavy edges and light edges. When we traverse down a heavy edge, the size of the subtree would be at least halved due to how we have chosen to split the heavy tree so there are most $O(\log n)$ heavy edges on the path from some vertex to the root on our constructed tree. However, when we traverse down a light edge, there is no guarantee about what happens to the size of the subtree, except it can decrease by $1$, which is pretty useless. Luckily for us, we know that for every vertex, it has at most $O(\log n)$ light edges on the path to the root, because that's how HLD works. So we can determine that the depth of the tree is $O(\log n)+O(\log n)=O(\log n)$. We have shown that our complexity for querying is $O(\log n)$. Also, is not too hard show that our complexity of construction of this structure is $O(n \log n)$ since constructing the tree from a single heavy chain is literally centroid decomposition.

The depth of the tree is $O(\log n)$ but what is the constant. The number of heavy and light edges are both $\log n$ so our analysis from earlier gives us a $2 \log n$ bound on the height of the tree. Can we do better? It turns out no, this bound is sharp (up to $\pm 1$ maybe). Here is how we can construct a tree that forces the depth to be $2 \log n$ by our construction.

Let $T(s)$ be our constructed tree that has $s$ vertices. It is defined recursively by the below image.

Let $d(s)$ denote the depth of $T(s)$. As a base case, $d(0)=0$. We also have $d(s)=2+d(\frac{s-2}{2})$ since the heavy chain of $T(s)$ would be on the right side of the tree, so $a$ would be connected to its left child by a light edge in the original tree. We can have the root of the heavy chain be vertex $b$ in our constructed tree (it can be vertex $a$ but we want to assume the worst case) so that in our construction tree, $a$ would have a depth of $2$, requiring us to traverse $2$ edges to get to $T(\frac{s-2}{2})$. Therefore, it is easy to see that we can make $T(s)$ have a height of about $2 \log n$.

#### Path and Subtree Queries

With our normal HLD+segment tree query, we can easily handle both path and subtree queries $[7]$.

Can we do it for our new structure? Yes.

Firstly, one of the problems of subtree operations is that if the number of children is very large, it will be hard to compute the aggregate values of children. This is the reason for the difficulty of $[9]$. But we are not doing operations on a dynamic tree, we can simply augment our tree to make the number of children small for our case.

As in $[9]$, the idea is to binarize the tree, however since we do not have to care about the depth of the augmented tree, we can simply augment it into a caterpillar graph.

Subtree operations are the same on the original tree and the main tree. The only case we have to handle differently is path operations. For example, the path $A \to B$ passes through $X$ in the original tree but not in the augmented tree. However, we can solve this by checking if the lca of $A$ and $B$ is a fake vertex. If so, we separately process the actual lca in the original tree.

The original lazy tag we used when we only had path queries only applies the value to the children in the real tree. However with subtree queries, we need a new lazy tag that applies the value to all children, which includes children connected via light edges. Modifying the code to add another lazy tag is not hard, just very tedious.

More specifically, we have $2$ lazy tags, $lazy1$ and $lazy2$. $lazy1$ is applied to the children in the same heavy chain while $lazy2$ is applied to all children regardless of whether or not they are in the same heavy chain.

Then, the true value of a vertex $u$ in the balanced binary tree is $val_u + \sum\limits_{\substack{v \text{ is ancestor of } u \\ v \text{ and } u \text{ same heavy chain}} }lazy1_v + \sum\limits_{v \text{ is ancestor of } u }lazy2_v$.

Modifying these changes to the original algorithm is not difficult, just very tedious.

### Benchmarks

Here are the benchmarks for the various implementations of the tree path queries so that you have a better ideas of the practical performance of the things I will describe so you will realize that the algorithm is practically pretty useless (except, maybe some interactives which are similar to $[8]$).

The problem is given a tree with $n=10^6$ vertices where all vertices initially of weight $0$, handle the following $q=5 \cdot 10^6$ operations of $4$ types:

• 1 u v w increase the weights of all vertices on the simple path from $u$ to $v$ by $w$
• 2 u v find the maximum weight of any vertex on the simple path from $u$ to $v$
• 3 u w increase the weights of all vertices on the subtree of $u$ by $w$
• 4 u find the maximum weight of any vertex on the subtree of $u$

It is bench-marked on my desktop with Ryzen 7 3700X CPU with compilation command g++ xxx.cpp -O2.

Note that the difference between balanced HLD 1 and 2 is that balanced HLD 2 is able to handle all types of queries while balanaced HLD 1 is only able to handle the first $2$ queries.

Benchmarks when there are only query types $1$ and $2$.

HLD + segment tree (single segment tree) HLD + segment tree (many segment tree) HLD + segment tree (many segment tree, ACL) link-cut tree HLD + splay tree Balanced HLD 1 Balanced HLD 2
Time Complexity $O(n + q \log ^2 n)$ $O(n + q \log ^2 n)$ $O(n + q \log ^2 n)$ $O(n+q \log n)$ $O(n + q \log n)$ $O((n + q) \log n)$ $O((n+q) \log n)$
Random ($wn=0$) $14.31~s$ $8.91~s$ $10.89~s$ $8.66~s$ $9.77~s$ $7.90~s$ $13.38~s$
Random ($wn=-10$) $10.84~s$ $6.65~s$ $6.97~s$ $5.78~s$ $5.18~s$ $4.54~s$ $11.17~s$
Random ($wn=10$) $15.14~s$ $10.73~s$ $12.62~s$ $10.69~s$ $13.25~s$ $10.04~s$ $13.74~s$
Binary Tree ($k=1$) $21.45~s$ $13.09~s$ $17.49~s$ $12.40~s$ $13.62~s$ $10.59~s$ $13.96~s$
Binary tree ($k=5$) $20.48~s$ $14.64~s$ $18.12~s$ $11.82~s$ $15.16~s$ $11.80~s$ $14.90~s$

Benchmarks when there are all $4$ query types.

HLD + segment tree (single segment tree) Balanced HLD 2
Time Complexity $O(n + q \log ^2 n)$ $O((n+q) \log n)$
Random ($wn=0$) $9.06~s$ $10.61~s$
Random ($wn=-10$) $7.12~s$ $8.86~s$
Random ($wn=10$) $9.92~s$ $11.15~s$
Binary Tree ($k=1$) $13.47~s$ $11.80~s$
Binary tree ($k=5$) $11.67~s$ $11.63~s$

I am unsure why a value of $k$ closer to $\sqrt n$ made the first $3$ codes all faster. Maybe there is something wrong with my generator or is the segment tree just too fast? Also, IO takes about $1~s$.

Here are my codes (generators + solutions). They have not been stress tested and are not guaranteed to be correct. They are only here for reference.

gen_random.cpp
gen_binary.cpp
hld_single.cpp
hld_many.cpp
hld_many_ACL.cpp
hld_splay.cpp
binary_tree.cpp
binary_tree2.cpp
script.py

## 1/3 Centroid Decomposition

When I was writing this blog, I was wondering whether we could cut the $log$ factor from some centroid decomposition problems. Thanks to balbit for telling me about this technique.

Consider the following problem: You are given a weighted tree of size $n$ whose edges may have negative weights. Each vertex may either be toggled "on" or "off". Handle the following $q$ operations:

1. Toggle vertex $u$.
2. Given vertex $u$, find the maximum value of $d(u,v)$ over all vertices $v$ that is toggled "on". It is guaranteed that at least one such $v$ exists.

It is well-know how to solve this in $O(n \log n + q \log^2 n)$ by using centroid decomposition + segment trees. But can we do better?

The reason segment trees have to be used is because when we query for the longest path ending at some centroid parent to perform our queries, we have to ignore the contribution of our own subtree. An obvious way to solve this is to try to decompose the centroid tree in such a way that each vertex has at most $2$ children. Unfortunately, I do not know a way to do this such that the depth of the tree bounded by $\log_2 n$, but there is a way to make the depth of the tree bounded by $log_{\frac{3}{2}} n$.

Instead of thinking of doing centroid decomposition on vertices, let us consider doing it on edges. Top image is the usual cetroid decomposition, while bottom image is the one we want to use here. That is, the centroid gets passed down to its children when recursing.

Ok, so now we want to figure out what is the largest possible number of edges of the smaller partition. Remember, we want to make this value as large as possible to get a split that is as even as possible.

Firstly, we have a lower bound $\frac{1}{3}m$, whre $m$ is the number of edges. This is obtained when the tree is a star graph with $3$ children. The number of edges in each subtree are $[1,1,1]$, it is clear that the best way to partition the subtrees is $[1]$ and $[1,1]$, which gives us our desired lower bound.

Now, we will show that this bound is obtainable. Let $A$ be an array containing elements in the interval $[0,\frac{1}{2}]$ such that $\sum A=1$. This array $A$ describes the ratio between the number of edges in each subtree against the total number of edges. The elements are bounded above by $\frac{1}{2}$ due to the properties of the centroid.

Then, the following algorithm surprisingly finds a subset $S$ whose sum lies in the range $[\frac{1}{3},\frac{2}{3}]$.

1. tot=0
2. S={}
3. for x in [1,n]:
4.     if (tot+A[x]<=2/3):
5.         tot+=A[x]
6.         S.insert(x)

It is clear that $\sum\limits_{s \in S} A[s] \leq \frac{2}{3}$, so it suffices to show that $\sum\limits_{s \in S} A[s] \geq \frac{1}{3}$.

Let $P[x]=A[1]+A[2]+\ldots+A[x]$, that is $P$ is the prefix sum array.

Consider the first index $x$ such that $P[x]>\frac{2}{3}$. We will split into $2$ cases.

• $A[x]<\frac{1}{3}$: when we have completed iteration $x-1$, the $\sum\limits_{s \in S} A[s] = P[x-1] = P[x]-A[x] > \frac{1}{3}$.
• $A[x] \geq \frac{1}{3}$: it is easy to see that the final $S=[1,n] \setminus \{x\}$. So $\sum\limits_{s \in S} A[s] = 1-A[x] \geq \frac{1}{2}$.

Therefore, we are able to obtain a centroid tree which is a binary tree and has depth at most $\log_{\frac{3}{2}} n$.

Returning back to the original problem, we are able to solve it in $O(n \log n + q_1 \log^2 n + q_2 \log n)$ where $q_1$ and $q_2$ are the number of queries of type $1$ and $2$ respectively.

## References

• +245

By errorgorn, 7 months ago,

Hi, this is going to be a solution to a "harder" version of the last problem of the recent div 2 round that I coordinated. We can actually solve the problem for $|s| \leq 60$ but we decided to lower the constraint as forcing this solution is not interesting and also way too hard for the last problem of a div 2. I would like to thank the authors for allowing me to write this blog and shamelessly farm contribution from their contest.

Firstly, I would like to share a solution (credits to Ari) which solves $|s| \leq 60$ without explictly factoring the entire polynomial.

spoiler

Now, let's get into the algorithm to factorize polynomials. We will use Berlakamp's algorithm, which is an algorithm to factorise polynomials where the coefficients of polynomials are elements of the field $GF(q)$, where $q=p^k$. I am writing this blog because the original paper was pretty hard to follow for me. So, I hope writing this blog would make understanding this algorithm from a competitive programming standpoint easy.

Before we get to the algorithm we have to prove $2$ things first:

$f(x)^q-f(x)=\prod\limits_{u \in GF(q)} (f(x)+u)$

proof

$f(x)^q=f(x^q)$

proof

### The actual algorithm

Let's say we found some polynomial $h(x)^q-h(x) = 0 \pmod {f(x)}$ where $1 \leq \deg h<\deg f$ (ignore how we have done this for now).

We know that $h(x)^q-h(x)= \prod\limits_{u \in GF(q)} (h(x)+u)$ which is a multiple of $f(x)$.

$\gcd$ is a multiplicative function in the sense that for coprime $a,b$, we have $\gcd(a,c) \cdot \gcd(b,c) = \gcd(a \cdot b,c)$. It is obvious that $h(x)+a$ is coprime from $h(x)+b$ when $a \neq b$. So $\prod\limits_{u \in GF(q)} \gcd(h(x)+u,f(x)) = f(x)$.

We are almost done, we just need to show that the product above gives us a useful factorization of $f(x)$, that is it doesn't tell us that $f(x)=f(x) \cdot 1 \cdot 1 \ldots$. But we have $\deg h<\deg f$, so it is impossible that $\gcd(h(x)+a,f(x))=f(x)$.

### Finding $h$

We have $h(x)^q-h(x) = h(x^q)-h(x) = 0 \pmod{f(x)}$. Therefore, we can reduce this to finding $h_0,h_1,\ldots$ such that $h_0(x^{0q}-x^0)+h_1(x^{1q}-x^1)+\ldots = 0 \pmod{f(x)}$. This is some linear algebra as we want to find the null-space of polynomial of $x^{iq}-x^i \mod f(x)$. This can be done using XOR basis Since we are finding $h(x)$ under modulo $f(x)$, we can reduce any non-trivial solution to $h(x)$ to one with $1 \leq \deg h < \deg f$.

Now, the only problem is figuring out when such $h$ exists. Suppose that $f(x)=f_0(x)f_1(x)$ for some coprime non-constant polynoimials $f_0(x)$ and $f_1(x)$.

By Chinese remainder theorem, we know that for any $(c_0(x),c_1(x))$ with $\deg c_i < \deg f_i$, we can find a $h(x)$ that satisfies $h(x) \pmod{f_i(x)}=c_i(x)$.

Let us look at the particular case of $(c_0(x),c_1(x))=(1,1)$.

We have $h(x) = 1 = 1^q = h(x)^q \pmod{f_i(x)}$.

Since we have $h(x)=h(x)^q \pmod{f_i(x)}$, it follows that $h(x)=h(x)^q \pmod{f(x)}$.

However, a shortcoming of this process is that $h(x)$ only exists when we can find appropriate $f_0(x)$ and $f_1(x)$. In particular, we will have to seperately handle the case where $f(x)=g(x)^k$ where $g(x)$ is an irreducible polynomial and $k>1$.

### Handling Repeated Powers

Notice that $\gcd(f(x),f'(x)) = \gcd(g(x)^k,k g(x)^{k-1})$.

• If $k$ is a multiple of $p$, then we can use the identity of $f(x^p)=f(x)^p$ to recurse to a smaller polynomial.
• Otherwise, $\gcd(f(x),f'(x))=g(x)^{k-1}$ and we immediately have $g(x)=\frac{f(x)}{\gcd(f(x),f'(x))}$.

## Time Complexity

Here is my implementation: 162171925. Note that it does not actually solve $|s| \leq 60$ quickly because it does not do factorize on integers quickly.

Let the degree of the polynomial be want to factorize be $d$. The each call of factor requires us to build a xor basis and also find the $\gcd$ of $q$ polynomials. They respectively take $O(d^2 \cdot A(d))$ and $O(q \cdot d \cdot A(d))$ time, where $O(A(d))$ is the time complexity required to add two polynomials. factor is called at most $d$ times.

In $GF(2)$, we can assume that $A(d)=1$ because it is literally XOR, so the time complexity is $O(d^3)$.

In $GF(q)$, it does not make sense to assume that $A(d)=1$, so we will use $A(d)=d$. The time complexity balloons to $O(d^4 + q d^2)$.

• +75

By errorgorn, 8 months ago,

Hi everyone! Today nor sir and I would like to talk about generating uniform bracket sequences. Over the past few years when preparing test cases, I have had to generate a uniform bracket sequence a few times. Unfortunately I could not figure out how, so I would like to write a blog about this now. Hopefully this blog would be useful to future problem setters :) Scroll down to the end of the blog to copy our generators.

First, let's define some things. A bracket sequence is a string with $n$ $\texttt{(}$ s and $n$ $\texttt{)}$ s. A balanced brackets sequence is a bracket sequence with the additional constraint that it becomes a valid arithmetic expression when we add some $\texttt{1}$ s and $\texttt{+}$ s. Let $p$ be the prefix sum array of a bracket sequence (we assign $1$ to $\texttt{(}$ and $-1$ to $\texttt{)}$ and take the prefix sum). For example, if our bracket sequence is $\texttt{())()((())}$, then $p=[1,0,-1,0,-1,0,1,2,1,0]$.

An alternate way to think about the prefix sum is to imagine the bracket sequence as a sequence of increasing and decreasing lines.

We can see that the necessary and sufficient for a bracket sequence to be balanced is that all elements of the prefix sum is non-negative.

Let's define the badness of a bracket sequence as the number of lines in the above diagram that is below the $0$-line. So, the badness of our bracket sequence is $4$.

Let's call a bracket sequence irreducible if the line touches the $0$-line exactly twice (once at the start and at the end). So, $\texttt{(())}$ and $\texttt{))()((}$ are both irreducible but not $\texttt{()()}$. It is natural to also define a irreducible decomposition of a bracket sequence. $\texttt{())()((())}$ gets decomposed into $\texttt{()}$,$\texttt{)(}$,$\texttt{)(}$,$\texttt{(())}$.

Now, we want to note that all irreducible bracket sequences are either balanced or are some sort of "inverse" of a balanced bracket sequences. By inverse, we can either reverse the entire string or flip every bracket in the string. Please note that these are different.

For example, $\texttt{)))(()((}$ can be turned into $\texttt{((())())}$ by flipping every bracket or $\texttt{(()(()))}$ by reversing the substring.

Firstly, let's start with finding a recursive formula for the Catalan numbers. One of the basic recurrences when dealing with bracket sequences is to decompose them into the form of $\texttt{(}X\texttt{)}Y$, where $X$ and $Y$ are both valid bracket sequences (they might be empty). This decomposition is actually unique since $\texttt{(}X\texttt{)}$ can only be our first term in our irreducible decomposition (for us competitive programmers, think of it as dp where our transition is removing the first irreducible substring). We immediately have the recurrence $C_{n+1} = \sum\limits_{i=0}^n C_i C_{n-i}$ for $n > 0$ and a base case of $C_0=1$ for $n=0$. After some manipulation with formal power series $[1,2 \text{ example } 2.3.3]$, you can find that we get the familiar $C_n = \frac{1}{n+1} \binom{2n}{n} = \frac{1}{2n+1} \binom{2n+1}{n}$.

## Mapping 1

Let's take a look at $C_n = \frac{1}{2n+1} \binom{2n+1}{n}$ first. What other sequence can we get the $\binom{2n+1}{n}$ term from? Well, the number of binary strings with $n$ $\texttt{(}$ s and $n+1$ $\texttt{)}$ s. And nicely enough, $2n+1$ is the length of said binary string. Let's define the equivalence class $\sim _R$ where $s\sim_Rt$ when $s$ is a cyclic shift of $t$. Generally for binary strings, it is not so easy to count the number of elements in someone's equivalence class when we define it this way. But since $\gcd(2n+1,n)=1$ here, all cyclic shifts of a string are distinct and there are exactly $2n+1$ strings in any equivalence class. We are almost at a bijection to Catalan numbers already! We just need to show that there is a bijection between equivalence classes and balanced bracket sequences.

#### Lemma

Given a string of $n$ $\texttt{(}$s and $n+1$ $\texttt{)}$s, there exists exactly $1$ cyclic shift where the first $2n$ elements is a valid bracket sequence.

As an example, consider the bracket sequence $\texttt{)())(()}$. The only cyclic shift that gives us a prefix sum with our condition is $\texttt{(())())}$.

The proof of this is not too hard and you should prove it yourself, but for completeness, I will put it here.

proof

We can see that in each equivalence class, there is exactly a single element whose first $2n$ elements are a balanced bracket sequence. Let this be the representative of the equivalence class. We can obtain balanced bracket sequences from representatives by removing the last character and vice versa by appending a $\texttt{)}$. Therefore, there is a bijection between the equivalence classes and

There is a interesting problem which is related to this $[4]$ (in Chinese).

translation
solution

## Mapping 2

Thank you nor sir for writing this part and independently discovering it.

The mapping mentioned here was, unfortunately, already present in literature $[5]$ after we checked, and my old proof was quite similar to the one mentioned in the paper. It used the Chung-Feller theorem, which says that the badness of bracket sequences is uniformly distributed (i.e., there are an equal number of strings for each value of badness), and then showed that the mapping was in fact a bijection from the set of strings of a fixed badness to the set of balanced bracket sequences.

However, I came up with a much simpler proof which we will present below along with the intuition behind the construction (we could not find this proof anywhere, so if anyone finds a reference, please let us know in the comments).

The main idea behind what we will do below is the identity $C_n = \frac{1}{n + 1} \binom{2n}{n}$. We try to construct a balanced bracket sequence from a bracket sequence with $n$ opening brackets and $n$ closing brackets, and then try to show that this construction is in fact an $n + 1$ to $1$ mapping from $S_n$ (the set of bracket sequences with $n$ opening brackets and $n$ closing brackets) to $S^*_n$ (the set of balanced bracket sequences of length $2n$).

The first idea is to note that, as mentioned earlier, if a string $S = AB$, where $A$ is the irreducible prefix of $S$, then we can make $A$ an irreducible balanced bracket sequence by either flipping all brackets of $A$ (which corresponds to flipping the corresponding diagram in the $x$-axis) or by reversing the string $A$ (which corresponds to rotating the diagram $180^\circ$).

The first attempt at finding a mapping was to do this: set $f(S) = A'f(B)$, where $A'$ is the string obtained by flipping all brackets in $A$ (or just reversing $A$). However, this will not give us a uniformly random way of constructing balanced bracket sequences, since $\texttt{((()))}$ is the image of $2$ strings under this mapping, and $\texttt{()()()}$ is the image of $8$ strings under this mapping.

So here was the crucial idea: we try to offset this asymmetry by modifying how we handle the case when $A \ne A'$. In this case, $A =\; \texttt{)}C\texttt{(}$ for some $C$, i.e., $S = \texttt{)}C\texttt{(}B$. So in order to try to compensate for the imbalance in distribution, we use $\texttt{(}f(B)\texttt{)}C'$ instead (where $A' = (C')$). As it turns out, this works out perfectly.

Here's the formal construction:

Definition: $f: \cup_{n = 0}^\infty S_n \to \cup_{n = 0}^\infty S^*_n$ is defined as

1. $f(\varepsilon) = \varepsilon$, where $\varepsilon$ is the empty string.
2. $f(AB) = Af(B)$, if $A$ is irreducible and balanced.
3. $f(AB) = (f(B))C'$, if $A$ is irreducible and not balanced, and as a corollary, $A =\; )C($ for some $C$.

Claim: For any string $S^* \in S^*_n$, there are exactly $n + 1$ solutions to $f(S) = S^*$ for $S \in S_n$.

Proof: Let $S^* = A^*B^*$, where $A^*$ is the irreducible prefix of $S^*$. Let $S \in S_n$ be a string satisfying the equation $f(S) = S^*$. We prove this by induction. For $n = 0$, there is nothing to prove. Let's suppose $n > 0$. Then $S$ can be mapped to $S^*$ by using one of the two rules, depending on whether the irreducible prefix of $S$ is a balanced bracket sequence or not:

1. Let $S = AB$, where $A$ is a balanced irreducible prefix of $S$. Then we must have $S^* = f(S) = A f(B)$. Since $A, A^*$ are both balanced irreducible prefixes of $S^*$, they must be equal, and $B^* = f(B)$. The number of solutions in this case is hence equal to $\frac{|B^*|}{2} + 1$ by the induction hypothesis.
2. Let $S = AB$, where $A$ is not balanced, but an irreducible prefix of $S$. Then $A =\; \texttt{)}C\texttt{(}$, and $A' = \texttt{(}C'\texttt{)}$ (as mentioned above). By definition, $S^* = f(S) = \texttt{(}f(B)\texttt{)} C'$. By an analogous argument as above, since $f(B)$ is balanced, $A^* = \texttt{(}f(B)\texttt{)}$ and $B^* = C'$. There are exactly $\frac{|A|^*}{2}$ solutions as earlier.

Overall, there are $\frac{|A^*| + |B^*|}{2} + 1 = n + 1$ solutions to $f(S) = S^*$, as needed.

In fact, if, in the proof, we also keep track of the badness of the strings that generate a given balanced bracket sequence, we can see that there is one input string for each possible value of badness (and this gives an alternative proof of the Chung-Feller theorem).

# Generating Balanced Bracket Sequences

It has been asked before on CodeForces $[6]$ how to generate a random bracket sequence. Judging from the comments, no one has been able to figure out how to generate it quickly (i.e. in linear time). So I will provide the code to generate a uniform balanced bracket sequence. Additionally, I will provide code that generates a random binary tree of $n$ vertices to balanced bracket sequences of size $2n$.

I know that there is also a bijection between bracket sequences and ordered trees, but there are better ways to generate a tree $[7]$. Also, there is also a bijection to full binary tree of $2n+1$ vertices, but there is a easy bijection to binary trees of $n$ vertices noting that the full binary tree of $2n+1$ vertices has $n+1$ leaf nodes. We simply remove all leaves, which leaves (pun not intended) us with a binary tree of $n$ vertices. So the binary tree is the "internal nodes" of the full binary tree.

Also, there is a (rather inelegant) method of generating balanced bracket sequences without using any of the techniques discussed earlier.

One of the first ideas that anyone attempting this task might have is to randomly place $\texttt{(}$ or $\texttt{)}$ in a string unless it is impossible to do so. However, we do not get a uniform distribution. Instead, let us vary the probability that we place $($ or $)$ in accordance to the actual probability that we will see in our required distribution.

Let $C_n^k$ denote the number of bracket sequences of size $2n$ with the first $k$ elements being $\texttt{(}$. From $[8]$, we know that $C_n^k = \frac{k+1}{n+1} \binom{2n-k}{n-k} = \frac{k+1}{2n-k+1} \binom{2n-k+1}{n-k}$. We can actually prove it using the same technique of adding a single $\texttt{)}$ and performing rotations.

proof

Ok, so if we want to make a balanced bracket sequence of length $2n$ and we currently placed $a$ $\texttt{(}$s and $b$ $\texttt{)}$s, then there are $C_{n-b}^{a-b}$ possible balanced bracket sequence that have our current string as a prefix. Out of them, $C_{n-b}^{a+1-b}$ have the next character being $\texttt{(}$. So we simply calculate the probability that we should put $\texttt{(}$ as $\frac{C_{n-b}^{a+1-b}}{C_{n-b}^{a-b}} = \frac{n-a}{2n-a-b} \cdot \frac{a-b+2}{a-b+1}$. Notice that $n-a$ and $n-b$ are the remaining numbers of $\texttt{(}$ s and $\texttt{)}$ s respectively. The $\frac{n-a}{2n-a-b}$ term nicely corresponds to the ratio of $\texttt{(}$ s in the remaining characters.

As for the bijection from balanced bracket sequences to binary trees, remember earlier when we showed the recurrence $C_{n+1} = \sum\limits_{i=0}^n C_i C_{n-i}$ by bijecting into pairs $(X,Y)$ such that string is of the form $\texttt{(}X\texttt{)}Y$. This bijection works for a recursive bijection into binary trees too $[\text{folklore}]$. That is, $\texttt{()}$ bijects to the binary tree with $1$ nodes, then for any other bracket sequence, $X$ describes the left child and $Y$ describes the right child.

Ok enough talking, here are the codes (I used testlib $[7]$). They are written by both nor sir and I.

generator
generator (C_n^k)

# A Surprisingly Fast Generation

Thanks to pajenegod for pointing out that this method might be faster than $O(n^2)$.

Let us return to the original reccurence of $\texttt{(}X\texttt{)}Y$. This gives us a natural DnC-esque recurence if we are able to determine how to uniformly choose the correct size for $X$ and $Y$. Recall that $C_{n+1} = \sum\limits_{i=0}^n C_i C_{n-i}$ and that we have the approximation $C_n \approx \frac{4^n}{\sqrt{\pi n^3}}$ $[1]$.

For some bracket sequence of size $2(n+1)$, the probability we recurse into $X$ of size $s$ is $P_s = \frac{C_{s}C_{n-s}}{C_{n+1}}$. Of course, we are not able to store $C_n$ directly, but using the earlier approximation earlier, it makes sense to store $C'_n = \frac{C_n}{4^n}$, which should be at a reasonable size to use doubles. We can calculate $C'_{n+1} = \frac{n+\frac{1}{2}}{n+2} \cdot C'_n$ in linear time. Nicely, $P_s = \frac{C'_{s}C'_{n-s}}{4 C'_{n+1}}$.

Here, we can already bound these types of reccurences by $O(n \log n)$ if we don't care about the distribution of $s$ that we recurse to. We can do this by mapping $P_s$ in the order of $P_0,P_n,P_1,P_{n-1},\ldots$. That is, we generate a random number in $Z=[0,1)$. If $Z < P_0$, we recurse with $s=0$, if $Z < P_0+P_n$, we recurse with $s=n$, etc.

With this method, our time complexity is actually $T(n)=O(\min(a,b))+T(a)+T(b)$. So, we have $T(n)=O(n \log n)$.

But maybe we can do better. Note that the true complexity is $T(n+1) = \sum\limits_{s=0}^{n} P_s \cdot (T(s) + T(n-s) + O(\min(s,n-s))$

Let's look at the distribution of $P_s$ for $n=8$.

$P_0$ $P_1$ $P_2$ $P_3$ $P_4$ $P_5$ $P_6$ $P_7$
$0.3$ $0.092$ $0.059$ $0.049$ $0.049$ $0.059$ $0.092$ $0.3$

We see that the distribution is symmetric (which isn't that important) and that larger values appear on the extreme ends (which is important). This suggests in most cases, our recursion will make one side much smaller than the other.

Let us try to find an approximation for expected value of $\min(s,n-s)$ when drawn over the probability distribution of $P_s$. Note that $P_s = \frac{C'_{s}C'_{n-s}}{4 C'_{n+1}} \approx \frac{\sqrt{n^3}}{4 \sqrt{\pi s^3(n-s)^3}}$.

When $s \leq \frac{1}{2} n$, we get $P_s \lessapprox \frac{\sqrt{8}}{4 \sqrt{\pi s^3}} = \frac{1}{\sqrt{2 \pi}} \cdot \frac{1}{\sqrt{s^3}}$.

\begin{align}E(\min(a,b)) &= 0 \cdot P_0 + 0 \cdot P_n + 1 \cdot P_1 + 1 \cdot P_{n-1} + \ldots \\&= 2 \cdot (\sum_{s=0}^{\frac{n}{2}} s \cdot P_s)\\ &\lessapprox \sqrt{\frac{2}{\pi}} \cdot (\sum_{s=0}^{\frac{n}{2}} \frac{s}{\sqrt{s^3}}) \\ &< \sqrt{\frac{2}{\pi}} \cdot (\int_{0}^{\frac{n}{2}} \frac{1}{\sqrt{s}} \, ds) \\ &= \sqrt{\frac{2}{\pi}} \cdot \sqrt{2n} \\ &= 2 \sqrt{\frac{n}{\pi}} \end{align}

So, I thought the reccurence was going to look like $T(n)=O(\sqrt n) + T(\sqrt n) + T (n - \sqrt n)$. Which gives us $T(n) = O(n \log \log n)$. But unfortunately, this turned out to not be true.

After empirical testing and explicitly calculating the theoretical value of $T(n)$, I have observed that the actual complexity is $O(n \log n)$. I think it is quite interesting that despite our recurrence usually cutting at values of around $\sqrt n$, we end up having a total complexity of $O(n \log n)$ anyways, but probably with a significantly large logarithm base.

I'm using a Ryzen 7 3700X CPU with the compile command g++ xxx.cpp -O2.

code (theoretical value)
code (empirical value)
output (empirical testing)

# References

• +205

By errorgorn, 10 months ago,

On Apr/23/2022 17:05 (Moscow time), we will host Codeforces Global Round 20.

Note the unusual timing, it is 30 minutes earlier.

This is the second round of the 2022 series of Codeforces Global Rounds. The rounds are open and rated for everybody.

The prizes for this round:

• 30 best participants get a t-shirt.
• 20 t-shirts are randomly distributed among those with ranks between 31 and 500, inclusive.

The prizes for the 6-round series in 2022:

• In each round top-100 participants get points according to the table.
• The final result for each participant is equal to the sum of points he gets in the four rounds he placed the highest.
• The best 20 participants over all series get sweatshirts and place certificates.

Thanks to XTX, which in 2022 supported the global rounds initiaitive!

All problems were written and prepared by errorgorn, maomao90 and oolimry.

We would like to thank the following people who made this round possible:

You will have 3 hours to solve 9 problems, one of which is divided into two subtasks.

The scoring distribution is 250-500-750-1000-1500-(1250+1250)-2750-3000-4000. GLHF!

Also, please upvote the blog so that I can get more contribution than antontrygubO_o.

PS: We would like to take this opportunity to congratulate rotavirus or antontrygubX_y on getting married. Wishing you lots of love and happiness.

Edit: Editorial is released. Even if you have solved a problem, we encourage you to read the editorial as you might learn something new from it.

Announcement of Codeforces Global Round 20

• +861

By errorgorn, 10 months ago,

Disclaimer: This blog is entirely my own opinion, please do not get mad at the authors from round 779. If you did not enjoy that round, please do not blame the authors. Personally, I felt that the authors overall did a wonderful job (SPyofgame's div 2F was honestly one of my favourite problems in 2022 so far).

Last week round 779 was held, a common feedback that people seemed to be quite vocal about was that the round was "PermutationForces".

If we look at the actual contest, we do see that problems B, C, D, E all contain the word permutation inside, so it is natural to think that problems are all repetitive. But saying a round is "PermutationForces" is like saying a round is "ArrayForces". Like arrays, permutations are very basic objects that are one of the basic building blocks of competitive programming, claiming that a round has too many problems using permutations is pretty baseless to me.

According to the definition of permutations, a permutation is an array consisting of $n$ distinct integers from $1$ to $n$ in arbitrary order. For example, $[2,3,1,5,4]$ is a permutation, but $[1,2,2]$ is not a permutation ($2$ appears twice in the array) and $[1,3,4]$ is also not a permutation ($n=3$ but there is $4$ in the array).

Sometimes when encountering permutation problems, we only use the definitions of permutation at "face value".

• 1658B - Марин и не взаимно простая перестановка — a permutation of length $n$ consists of distinct integers from $1$ to $n$ in arbitrary order, so there are at most $\frac{n}{k}$ elements which are a multiple of $k$.
• IZhO 2015 Day 1 A — a permutation of length $n$ consists of distinct integers from $1$ to $n$ in arbitrary order, so a straightforward approach is just to do multiset comparison using hashing since the multiset of any permutation of length $n$ is unique.

However, in most permutation problems, we do not have to explicitly care that these values are from $1$ to $n$, if we instead say that they are $n$ distinct real numbers the problem would not change. In fact, in most problems, we do not have to do any arithmetic operations on elements on the array. For example, in the context of the problem given the permutation $[1,2]$, I can completely don't care that $1+1=2$. I only need to know that $1=1$ and $1<2$. Using this, we can replace the array with something like $[e,\tau]$ and although it is false that $e+e=\tau$ (well, unless you are an engineer), it still holds true that $e < \tau$. In fact, in many problems where we only care about the relative position of elements, there is a trick known as "discretization" where we compress an array of large values into an array only containing values in $[1,n]$. I feel that for problems that clearly only care about the relative ordering of elements, problem setters should default to using values in $[1,n]$ except for special cases.

Some examples of problems to illustrate what I am talking about:

• 1658C - Синдзю и потерянная перестановкаpower is defined using the $max$ function, which only cares about the relative ordering of elements. We could possibly have used some array $a$ of length $n$ where all elements are in $[1,10^9]$ instead of a permutation here to prevent the corner case of having only a single $1$ in the array $c$ from happening. If you think about this problem more, it is actually about min stacks (wow! data structure in div 2 C? real?)
• 1658E - Годзё и игра на матрице — not so obvious example that we only care about the relative orderings of the elements here. I suggested that we change the problem to $1 \leq V_{i,j} \leq 10^9$ but I think it suggested it too late and so we left $V$ as a permutation in the end. I suggested that we make it in the range $[1,n^2]$ because of my philosophy regarding not requiring people to discretize and also allowing for $O(n^2)$ solutions, but I think setting larger limits would have been better here because the fact that we only care about relative orderings is completely non-trivial. Again, permutations play a very small role here and should be treated as "an array with distinct elements".
• 1641D - Два массива — notice that the condition for a good pair $(i,j)$ is that $a_{i,1},a_{i,2},\ldots,a_{i,m},a_{j,1},a_{j,2},\ldots,a_{j,m}$ is distinct. That is, we only need the inequality operator — we can discretize without caring about the relative orders of elements as long as our discretization function is a bijection. If you looked at the solution of anyone with $O(\frac{n^2\cdot m}{w})$ complexity, most likely they would have discretized all values in $a$.

Another aspect of permutation is studying cycles. Since the permutation is some arbitrary arrangement of integers from $1$ to $n$, a common way to view that is to think about the graph where the edges are of the form $(i,p_i)$. This graph very nicely has the properties of each edge having in-degree and out-degree of $1$. Also, this gives us a very intuitive sense of what the inverse permutation is — we just flip the edges of the graph we are looking at. It is very common to think about such permutation when the problem asks for something like "sort this sequence and your operation is swapping $2$ elements". The general approach for such problems is that you draw the graph of the permutation and completely forget that the permutation ever existed then proceed to solve the problem on the graph--- the permutation is just a compact way to represent our graph.

Finally, permutations may be used when we need to define how to "shuffle" a sequence. These problems usually have nothing related to permutations other than the fact that it is a useful way to define "shuffling" (see 1658D2 - 388535 (усложненная версия) and 1526D - Убить Антона).

I hope this blog can convince people that just because many problems in a contest references the notion of permutations, that does not mean that the round is all about a single concept. Although it is valid criticism to say a round has too much of some idea, it simply does not work when you only say "permutation". Just take a look at how many different ways we can view permutations under!

## Random Sidetrack

Many times in problems, we have to use a well-known object like a permutation, bitwise operator or a subsequence, we have to paste some definition into the statement. Of course, I feel it is kinda funny that the definition of a permutation has to appear in statements $3$ times in round 779. Furthermore, it is kind of weird to put the definition of a permutation on something like a div 1C. Do you really think the people who are attempting a div 1C really need to know what a permutation is?

I suggest that we instead have a glossary on CodeForces that defines all common terms that are needed in CodeForces, then like the guide to interactive problems, we can just paste that glossary page link into all announcement blogs so beginners can read up on those page for those "technical terms" that frequently appear. We don't need to paste lengthy definitions in statements, less work for everyone!

• +197

By errorgorn, 10 months ago,

2 years ago, antontrygubO_o wrote a blog about div2ABs where he expressed his opinions that d2ABs should not be about "here is a statement, please implement what is written there". Thanks to him, the quality of d2ABs (and problem quality in general) have certainly improved. However, I still believe that there still quite large differences between how coordinators/problemsetters view d2ABs and how the intended participants view them.

From the survey made by kpw29, we can see that most people agree that most people agree that we should primarily consider the target audience when proposing a task. I too think if a task is boring to div 1 contestants, we should not think of that as a reason to immediately disqualify a problem from being a d2A.

I think when people judge the interesting-ness of d2As, they try too hard to make it pleasing to div 1 contestants by having short implementation, short statement (no reading comprehension) and at some cute ideas.

While this mindset works very well for combined rounds where div 1 contestants do need to and forced to solve d2ABs and quickly get ABCD over with, I think we should have a different mindset when considering div 2 problems.

We should not make setting interesting d2ABs our top priority but instead make it setting friendly d2ABs. By friendly d2ABs, it means that the solution is easy to justify (there is no difficult constructive proof of why this random condition is sufficient) and that the statement is without much mathematical constructs, it would be best if a 5 year old can read the statement and be able to understand the problem.

Elaborating on my point about having d2ABs contain some non-trivial ideas, sometimes this ideas take the form of something that is downright unapproachable to newer contestants where for them, it feels like a "guess and check" game where they cannot prove anything they submit and just hope that their guess is good.

I believe that this makes beginners in competitive programming think that solving competitive programming is just having good guessing skills and such observation comes from nothing other than divine sources. I recognize that having a good intuition is a valid strategy to solve competitive programming problems and I do guess observations all the time when solving problems. But we should not colude not having a proof with the complete inability to justify why something should be true (I am looking at you 1672F1 - Перемешивание массива).

We should still strive observations on d2ABs more managable such that when one is able to correctly guess the observation, it would not be too hard for them to figure out why the code they submitted is correct if they wanted to. For some constructive d2ABs where the output is yes/no, it might make sense to ask for participants to print their contsruction too. Although this would make coding much more involved that it has to be, I believe this would be fine for most codeforces rounds.

Furthermore, when people set d2ABs, they sometimes use mathematical constructs that not all beginners would know of. It is not uncommon to see bitwise operations in the statement of a d2A followed by an explanation of what said bitwise operator is. If people see the need to provide a link to an explanation of what a bitwise operator is, then why not just not set a d2A that requires you to understand what a bitwise operator is to even comprehend the statement?

Imagine you are a beginner in competitive programming and you only know some basic language syntax, would you really be interested in caring about random problem where someone calculates some stupid value using some random function? There are billions of d2ABs with $\text{gcd}$, $\text{mex}$ or some bitwise functions. But how many times do you see such functions outside of d2ABs? With $\text{gcd}$ or $\text{mex}$, I think it is very rare. Because most of such problems involving these just use the "fun facts" of these functions.

Let's look at 1325A - ЕхАб И нОд as an example (which antontrygubO_o used as an example of a good d2A). The problem just asks if you know the fun fact that $\gcd(x,1)=1$ and $\text{lcm}(x,1)=x$. Despite myself belonging to the camp of formal statements, I believe that d2ABs should have as natural a statement as possible, in the sense that as much as possible, we should not have random mathematical constructs being the basis of d2ABs. I don't want to solve d2ABs where I have to recall the fun fact that $\gcd(x,1)=1$.

At the end of the day, there are many "factors" that affect the suitability of d2ABs for codeforces contests. I don't think we should sacrifice the enjoyability of d2ABs to the intended participants contestants for the sake of making them "interesting".

Because of the goal of setting d2ABs with the intended audience in mind, I decided to ask a few coordinators and users who are specialist and below to answer questions about their thoughts of d2ABs. Since I was already asking people their opinions on d2ABs, I decided to ask some additional questions that felt interesting to me somewhat related to this topic.

Thank you to Monogon, Um_nik, dario2994, antontrygubO_o, isaf27, 74TrAkToR, darkkcyan, -is-this-fft-, sus, tdpencil, kIee,SPyofgame and ak2006 for spending their time answering these questions and giving me permission to post it here.

Note: These responses were from a month ago but I procrastinated writing this blog.

#### What is your philosophy on d2ABs?

1. In general I believe the easiest problem should be completely trivial for the intended audience of the contest. A gift, so to speak. I have heard this principle from several seasoned contest organizers. This has several reasons.

1. Someone is much more likely to do another contest if they solved at least one problem, even if it is completely trivial to them.
2. A psychological thing: it builds engagement and commits you to the contest. If you are in a contest and can't solve the first problem relatively quickly, then you will kinda get bored and drift away. Once you solved the easiest problem you are more committed to the contest and are ready to spend more time to solve the rest.
3. On Codeforces, it warps ratings if people don't submit to the easiest problem because they don't have ideas for it.
2. I'm not a fan of "troll" div2ABs, that is, problems that have some weird restriction but some very simple construction. All it does is teach beginners that cp is about guessing.

1. Also, people are often misled by thinking that if a problem solution is simple, it must also be easy to solve. Solution length, unless very excessive, is no argument as to the difficulty of a problem. I'm reminded of a certain local contest where the easiest problem turned out to be pretty hard for the participants, for exactly this reason.
3. Easiest problems should not be annoying or tedious for the same reason as 1b.

4. Making the problem interesting should not come at the expense of any of the above.

Monogon: A div2AB problem should be possible for a programmer with minimal knowledge of algorithms to solve, but still presents some kind of challenge. Maybe some simple observation or a slightly nontrivial implementation will be necessary, but it shouldn't feel like you're mindlessly doing exactly what the statement says if you're a beginner.

dario2994: Good very easy problems are not so different from good hard problems, they are just easier. They should be interesting for everyone, but this is often an impossible requirement to satisfy. If one cannot find an easy problem interesting for everyone, then the priority should be to make it interesting for the target audience, i.e., newcomers in competitive programming. A good very easy problem does not require any knowledge, is not "implement what is described in the statement", is solvable in any language (python I am looking at you). I prefer to give very easy problems involving some computer science instead of something which is solved with a sequence of if-checks or in $O(1)$. I don't have any strong opinion about whether a very easy problem should be trivial to implement or not.

antontrygubO_o: They shouldn't be painful (in the sense that they shouldn't have long statements, be caseworky, or have long implementation), and shouldn't be "implement what you read". The natural statement is preferred. I prefer when there is some cute idea.

isaf27: My philosophy is:

• Perfect d2a should be the problem, that will be solved by all participants of the contest. On the other hand this problem should contain some idea, maybe very very simple and obvious, but idea. I don't like the problems "just write what is asked in the statement" even for d2a.
• d2b should be simple, but not obvious.

74TrAkToR: These are tasks for beginners. I think they should be on the idea. There should be a minimum implementation.

darkkcyan: I think that a round on Codeforces should be fun, and maybe educational for newcomers.

The word "difficulty" is relative. A problem can be very hard for one person, but can be trivial for the others. But when judging a problem by myself, I often measure it by the time I spent to think, as well as implementation time. I wanted that for A the idea should be found in a very short amount of time, if not instantly, at least with my level, and for B might be a bit longer, but not toooo long. But a coordinator is only a barrier, the next barrier is the testing phase, which should be more accurate if we invite a good spectrum of testers.

About the idea. First of all, I don't think div2A should be like Atcoder ABC A, because in Atcoder ABC A, you can do what is asked in the statement. Again, I wanted to round to be fun. The joy of solving puzzle is when you find the right idea, and thinking how smart you were. So a broad answer is just, not to make it too trivial and standard, and the rest is determined by the difficulty. That is the same for B, but yes, we must make B somehow harder. This is still an open question, but I think I should leave it there, because creating a problem is truly a hard task.

#### Why should we care about the quality of d2AB if most div 1 competitors forget them after 5 mins anyways?

antontrygubO_o: Question is weird: most Div1 users forget all problems after 5 minutes. But we should just aim for a better quality of problems.

Still, for me personally, boring/stupid D2A-D2C problems may ruin the round, as I am not expecting anything good in the harder problems. I understand that many Div1 users don't care, but many Div1 users would be happy with standard problems on the harder spots as well, so what?

darkkcyan: I think this is an odd question when asking a question about the easiest problems of a div2 round, made for div2 competitors. Well, it should be fun for div 1 competitors too. And for me, they are warm up problems. And for div 2 competitors, they should be fun, again. For newbies, there are educational values as well. In all cases, these problems are the introduction to the way of thinking and observing when doing contests, and I still think that they are doing a good job on that. I do see that nowadays, div2AB requires more thinking than the previous years, but that is also true for div2C and above. Thus, because they are the very firsts problems of a round, acting like guarding problems for the other problems, their qualities should be in the care as well.

because of the bulk of the participants only solve d2ABs and div1 participants are not the target audience

#### What things do you like/dislike about d2ABs on codeforces?

sus: I like the simplicity and typicallity of div2 A and Bs. Even though they only cover simple topics like math, brute force, and simulation, they allow any beginner cper to practice their problem solving skills. Some of the drawbacks of ABs is that they only cover a small amount of topics, so they might start to get repetetive/boring if a problem setter is feeling lazy and that you can't learn much from them after you get good at solving them. People get carried away trying to get 800 problems solved and only solve ABs trying to reach this goal.

Whenever I need to get ready for a contest, I get warmed up by quickly solving an AB. ABs serve as a stepping stool for beginners and thats what I like about them.

ak2006: I like the fact that div 2 ABs dont require much DSA knowledge and can be solved by someone who hasnt done much CP. I dislike that they might use too much math (number theory, combinatorics) which only math major or minors/math olympiad people may be acquainted with well enough to be able to solve them.

kIee: I do not like ABs at all, for contestants who cant really tackle harder problems/ spend much more time on them, AB is purely a speed contest. In addition, it's largely affected by internet speed, submission speed, etc, and their weightage is very high. Especially affected when trying to open up problems. (eg, solving AB slower by 1/2mins, can lose to another person even though i solve C faster by 15mins)

SPyofgame: Simple to read, simple to understand statement and testcase, interesting idea, not too focus on a specific type of problem, easy to solve but still require thinking a bit instead of doing without brain

#### Do you think d2ABs must necessarily be short implementation?

tdpencil: I believe d2A should. d2B should probably be more implementation based or should permit longer implementation. Just because newbies and beginners are going to try div 2 A and if they don't know how to code it up fast or they don't know how to write such long code then its going to be challenging for them.

sus: I think i dont care. There are 1 liners, there are 30 liners and there are those in between. Just another problem doesnt make a difference to me.

#### Some high rated users have difficulty differentiating d2As-d2Cs what would be a good rule of thumb to determine the correct position of a easy problem?

dario2994: Gauging the difficulty of problems is hard. It is hard to distinguish d2As from d2Bs as it is hard to distinguish d2Ds from d2Es. The difficulty is on a spectrum and there is not a cheap trick to distinguish. Experience helps and I don't think that high rated users find it harder to distinguish d2As from d2Bs than low rated users. But it might be that high rated users are more aware of their inability to make the distinction compared to low rated users.

antontrygubO_o: It's hard for me to imagine that someone may actually have difficulty with this, to be honest. Look at D2A-Bs of recent contests, you will most likely agree that B is harder than A in most of them.

tl;dr testers exist for a reason

#### Do you prefer div 2 rounds or educational rounds more? Why?

tdpencil: I prefer educational rounds. This is because i can pretty much depend on the difficulty of educational rounds and expect to learn something. Most edus are made by the same people so the quality of problems are about the same. For div2s however, a variety of people make the problems and it can be argued that some are much easier than others, meaning not all div2s are at the same difficulty (on average). I would argue that having div2s be varying difficulties is an issue but thats just my opinion.

ak2006: I prefer regular rounds since they use much less math for some reason — edu rounds arent very educational and arent much DSA based as they should be.

sus: I prefer div 2 rounds because educational rounds are the same as any other round for beginners: every round is an educational oppurtunity when you are new. The only reason I prefer div2s is becuase it is easier to gain rating in them. Sometimes I solve C in div2s which is worth a lot more than AB but in edu rounds C is worth the same.

# Ratings of d2ABs

I decided to ask people to give rate some d2ABs from 1 to 10. These div 2ABs were chosen to get a hopefully diverse set of current meta of d2ABs. But take the numerical rating with a giant pinch of salt.

In the words of dario2994, "it seems to me as "rank from 1 to 10 these apples and these cars according to their color, their speed, their consistency, and your overall feeling". It would be a random number."

Anyways, here are the assigned scores in a table form. I have bolded the scores which were assigned the coordinator of the respective contest (you can see that I'm pretty biased).

1266A 1281A 1339A 1391A 1581A 1583B 1598A 1610A 1634B 1642A
-is-this-fft- 10 8 9 3 5 3 8 2 5 6
antontrygubO_o 9 2 7 5 7 10 7 6 10 7
Um_nik 3 0 7 4 4 6 4 5 2 3
Monogon 5 7 10 6 3 5 9 7 4 6
darkkcyan 7 8 9 5 7 7.5 8 9 9 7
isaf27 10 2 6 5 8 6 9 6 10 4
errorgorn 6 1 10 4 9 7 10 5 3 4
kIee 1 8 10 2 7 3 8 4 2 9
sus 3 3 8 4 2 7 6 4 1 9
mean 6 4.33 8.44 4.22 5.78 6.06 7.67 5.33 5.11 6.11
stddev 3.28 3.35 1.51 1.20 2.39 2.21 1.80 2 3.62 2.15

From this, we can see what is pretty universally agreed to be good d2ABs (the comments are my opinion):

We can also see what the "controversial" problems are:

Here are the full text comments for each problem if you want to read them.

1266A
1281A
1399A
1391A
1581A
1583B
1598A
1610A
1634B
1642A

# Poll

I think it would be interesting to see how users from various ratings rate each of the 10 d2ABs stated earlier in the blog. So I have created a google form where you can rate these d2ABs. I will release the results after a couple of days.

Maybe I will do some data analysis on the results. I tried to do some data analysis on the scores assigned by coordinators (after normalization) and got this:

This chart looks very dumb by all standards but hopefully we can see better trends after I get more data.

Anyways, I encourage members of the community to share their thoughts in the comments about the current direction that d2ABs are heading towards, especially for people who are specialist or below since d2ABs are targeted towards them.

Also, to future problem setters, maybe this will allow you to get more d2ABs accepted by seeing the preferences of your coordinator lol

# Results of Poll

You can access the results of the poll here. I have manually added the scores of those people who I personally asked to send scores and the names of people who have posted their scores in the comments.

I have done some basic data analytics and I believe that the results is quite interesting. For the data analytics part, I used mostly the same procedures as those described by my comment. The only different was that I only did normalization for everyone to have mean=0, because it does not make sense that "1 5 9" would be normalized to the same thing as "8 9 10". Also, I removed entry 5 when I did the analytics because it had an extremely low mean score. When I checked it, it turns out that the person had given $7$ problems a score of $1$, so I just removed it. You can check my jupyter notebook file here.

Anyways, here is the plot on the average scores given to each problem.

We can see that 1281A actually has a bimodal distribution — either you love it or hate it. From manually looking at the scores assigned to it, I don't think there is any clear correlation between rating and whether you would like this problem. Suggesting that the idea of d2ABs having "do what author tells you = bad problem" is not agreed upon even on most div 1 users.

Here is the plot of using PCA to squish down $10$ dimensions ($1$ dimension for each score) into $3$ dimension so that we can visualise it better. The third dimension is represented by the size of the point (I hope that this will allow you to imagine the point as being nearer or further away). The points are colored by the rank (or rating) of the user.

One of the striking thing is the fact that there is a vague correlation between rating and our correlation on this chart, going from the bottom right corner to the top right corner. I know that I do not have sufficient data from div2 users but I think we can kind of conclude that the opinions of most coordinators on d2ABs differs quite abit from the cluster of div 2 users on the middle right. It seems that -is-this-fft-'s and Monogon's philosophies on div2ABs are more suited for having d2ABs that are targetted towards their intended audience.

Thanks to everyone who submitted to the poll!

• +330

By errorgorn, 12 months ago,

Recently, there was a local contest which I set a problem for (task 4 in this pdf).

The abridged problem statement is that you choose an array $A$ of length $N$ where the elements are $<2^X$. The grader will pick $[L,R]$ and return $A_L \oplus A_{L+1} \oplus \ldots \oplus A_R$, where $\oplus$ is the bitwise xor operation. You need to find any integer $z$ such that $L \leq z \leq R$.

I set this in contest with the constraints of $N=2^{19}$ and $X=192 \approx \frac{(\log N)^2}{2}$ where you also had to answer queries quickly.

Solution

During testing, oolimry and icypiggy decided it would be funny to write solutions that had $X \approx c \cdot \log N$ (of course they also had to ensure they could answer queries quickly so these are more of describing speedup techniques rather than techniques for reducing number of bits).

oolimry's solution
icypiggy's solution

Anyways, this raises some interesting questions about the minimal $X$ we need if we completely do not care about answering queries quickly. A trivial lower bound is $X \approx \log N$ since we could query $[pos,pos]$ for all values of $pos$. An upper bound is randomly choosing bits. I would think that it is reasonable to assume that the xor-sum of subarrays should be independent and we should require somewhere around $X \approx 4 \cdot \log N$ bits?

Is there a better lower and upper bound $X$ in this problem?

• +117

By errorgorn, 13 months ago,

I would like to thank:

• tfg for his helpful discussions, digging through papers and implementing SMAWK.

## Prerequisites

Let us first define the classical knapsack, unbounded knapsack and subset sum problems.

#### Knapsack

There are $N$ items. The $i$-th item has weight $w_i$ and value $v_i$. Find a set $S$ such that $\sum\limits_{i \in S} w_i \leq C$ and $\sum\limits_{i \in S} v_i$ is maximized.

#### Unbounded Knapsack

There are $N$ items. The $i$-th item has weight $w_i$ and value $v_i$. Find a multiset $S$ such that $\sum\limits_{i \in S} w_i \leq C$ and $\sum\limits_{i \in S} v_i$ is maximized.

#### Subset Sum

There are $N$ items. The $i$-th item has weight $w_i$. Find a set $S$ such that $\sum\limits_{i \in S} w_i = C$.

You should know how to do both versions of knapsack in $O(NC)$ and subset sum in $O(\frac{NC}{32})$ before reading this blog.

In this blog post, I will just show some results that speed up the above problems for special cases.

## Subset Sum Speedup 1

There are $N$ items. The $i$-th item has weight $w_i$. Let $\sum\limits_{i=1}^N w_i=C$. For each $k \in [1,C]$, can we find a set $S$ such that $\sum\limits_{i \in S} w_i = k$?

It turns out we can do this in $O(\frac{C \sqrt C}{32})$.

Let us first consider an algorithm that seems like it is $O(\frac{C \sqrt C \log C}{32})$. We will group items of equal weights together, so we will get $M$ tuples of the form $(w_i,occ_i)$ where there are $occ_i$ occurrences of $w_i$ in the original items. For each tuple, we will decompose it into powers of $2$. For example, for $(w_i,15)$, we will decompose it into $\{w_i,2w_i,4w_i,8w_i\}$, for $(w_i,12)$, we will decompose it into $\{w_i,2w_i,4w_i,5w_i\}$. If you think about these things as binary numbers, it is not too difficult to see that we will get the same answers when we use these new weights instead of the original weights.

Furthermore, it is well known that if you have some weights that sum to $C$, then there are only $\sqrt{C}$ distinct weights. So we can already determine that $M \leq \sqrt{C}$. Since $occ_i \leq C$, we can figure out that each tuple will contribute $\log C$ elements. Giving up a simple upper bound of $O(\frac{C \sqrt C \log C}{32})$.

However, this is actually bounded by $O(\frac{C \sqrt C}{32})$ $^{[1]}$. Consider the set of tuples that contribute a weight $k \cdot w_i$, we will call this set $S_k$. We want to show that $\sum\limits_{k \geq 1} |S_k| = O( \sqrt C)$. Firstly, we note that most of the new weights will be a multiple of $2^k$ of the original weight, for each tuple, it will only contribute at most $1$ new weight that is not a power of $2$. Therefore, if we can show that $\sum\limits_{k=2^z} |S_k| = O(f(C))$, then $\sum\limits_{k \geq 1} |S_k| = O(f(C)+\sqrt C)$.

It is obvious that $\sum\limits_{i \in S_k} w_i \leq \frac{C}{k}$ and we can conclude that $|S_k| \leq \sqrt\frac{C}{k}$.

Therefore, $\sum\limits_{k=2^z} |S_k| \leq \sum\limits_{z \geq 1} \sqrt \frac{C}{2^z} = \sqrt C \left (\sum\limits_{z \geq 1} \frac{1}{\sqrt {2^z}} \right) = O(\sqrt C)$.

However, there is a simpler way to do this $^{[2]}$. Consider processing the weights from smallest to largest, with an initially empty list $W'$ as the list of our new weights. Suppose there are $occ$ occurrences of the smallest weight $w$. If $occ$ is odd, we add $\frac{occ-1}{2}$ occurrences of $2w$ to the original weights and $1$ occurrence of $w$ to $W'$. The case if $occ$ is even is similar, we add $\frac{occ-2}{2}$ occurrences of $2w$ to the original weights and $2$ occurrence of $w$ to $W'$.

It is not hard to see that $|W'| \leq 2 \sqrt C = O(\sqrt C)$, since we only add at most $2$ occurences of some weight to $W'$.

Problems:

## Subset Sum Speedup 2

There are $N$ items. The $i$-th item has weight $w_i \leq D$. Can we find a set $S$ such that $\sum\limits_{i \in S} w_i = C$?

We will solve this in $O(ND)$ $^{[3]}$. Firstly, if $\sum\limits w_i<C$, the answer is obviously no, so we will ignore that case.

Let us find the maximal $k$ such that $\sum\limits_{i=1}^k w_i < C$. The basic idea is that we initially have an answer of $w_1+w_2+\ldots+w_k$, then we can either subtract $w_i$ for $i \leq k$ or add $w_i$ for $i>k$ in some order such that the cost of our items is in the range $[C-D,C+D]$, which is not too hard to show constructively. The proof sketch is just: if the current weight is more than $C$, remove something, otherwise add something.

Let us define $can(total,l,r)$ as a dp function that returns true iff there exists $\lambda_l,\lambda_{l+1}, \ldots, \lambda_r \in [0,1]$ such that $\sum\limits_{i=1}^{l-1}w_i + \sum\limits_{i=l}^{r} \lambda_i w_i = total$, where $total \in [C-D,C+D]$.

Notice that $can(total,l,r) = true$ implies that $can(total,l-1,r)=true$, this means that $can$ is monotone on the dimension $l$. Therefore, let us define a new dp function $dp(total,r)$ that stores the maximal $l$ such that $can(total,l,r)=true$.

Furthermore, $can$ is monotone on the dimension $r$, so $dp(total,r) \leq dp(total,r+1)$.

Let us consider the transitions.

From $dp(total,r)=l$, we can extend $r$, transitioning to $dp(total+w_{r+1},r+1)$ or $dp(total,r+1)$. We can also extend $l$ and transition to $dp(total-w_{l'},r)=l'$ for $l'<l$. However, this transition would be $O(N)$ per state, which is quite bad.

However, it would only make sense to transition to $dp(total-w_{l'},r)=l'$ for $dp(total,r-1) \leq l' < dp(total,r)=l$, otherwise this case would have been covered by another state and there would be no need for this transition. Since $dp(total,r) \leq dp(total,r+1)$, the total number of transitions by extending $l$ is actually bounded by $O(ND)$ using amortized analysis.

Therefore, our dp will have $O(ND)$ states with a total of $O(ND)$ transitions.

Actually there is a scammy way to do this. Similarly, we find the maximal $k$ such that $\sum\limits_{i=1}^k w_i < C$, then we want to solve the subset sum problem with the weights $\{-w_1,-w_2,\ldots,-w_k,w_{k+1},\ldots,w_N\}$ and sum $C- \sum\limits_{i=1}^k w_i$.

Let us randomly shuffle the list of weights and pretend $C- \sum\limits_{i=1}^k w_i$ is very small. Then this can be viewed as a random walk with $0$ sum. The length of each step is bounded by $D$, so we can expect $O(D \sqrt N)$ deviation from the origin (my analysis is very crude). Using bitsets, we can obtain a complexity of $O(\frac{ND \sqrt N}{32})$ which is probably good enough.

The paper also mentions a way to generalize this technique to knapsack where weights are bounded by $D$ and values are bounded by $V$ in $O(NDV)$ but I think it is not a significant speedup compared to the usual knapsack to be used in competitive programming.

Problems:

## Knapsack Speedup 1

Consider SG NOI 2018 Prelim knapsack.

The editorial solution is to only consider $O(S \log S)$ items, since only $\frac{S}{w}$ items for a certain weight $w$ can be used, to get a complexity of $O(S^2 \log S)$. But I will discuss a slower $O(NS)$ solution.

Let $dp(i,j)$ be the maximum value of items if we only consider the first $i$ types using a weight of $j$.

Then our transition would be $dp(i,j)=\max\limits_{0 \leq k \leq K_i}(dp(i-1,j-k\cdot W_i)+k\cdot V_i)$. If we split each $dp(i,j)$ into the residue classes of $j \% W_i$, it is easy to see how to speed this up using the sliding deque trick or using an RMQ.

Now, let us talk about a more general speedup. But first, we will have to introduce a convolution that appears frequently in dp.

## (max,+) convolution

The problem statement is this: Given $2$ arrays $A$ and $B$ of length $N$, find an array $C$ of length $2N-1$ such that $C_i = \max\limits_{j+k=i}(A_j+B_k)$.

Doing this in $O(N^2)$ is easy. Can we do better? It seems that doing this faster is a really hard problem and so far the fastest known algorithm is a very scary $O(\frac{n^2}{\log n})$ or $O(\frac{n^2 (\log \log n)^3}{(\log n)^2})$, which does not seem practical for competitive programming.

Note that $(\max,+)$ and $(\min,+)$ convolution are not too different, we can just flip the sign. In the rest of this section, I will use $(\min,+)$ convolution to explain things.

First, we note that if both arrays $A$ and $B$ are concave, then we can do this convolution in $O(N)$ time $^{[5],[6]}$. The basic idea is we can consider the union of the slopes of the lines $\{A_i - A_{i-1} \} \cup \{B_i - B_{i-1} \}$ and sort them to get the slopes of $C$.

Another way to reason about this is using epigraphs (fancy word for colour everything above the line), which I find more intuitive. Because if we take the epigraph of $A$ and $B$, we get $2$ convex polygons, taking their Minkowski sum gets us the epigraph for $C$ and finding Minkowski sum on convex polygons is well known as taking the union of their edges, which is why this is also known as the Minkowski sum trick.

This speedup can be used in several dp problems such as ABC218H and JOISC18 candies. Furthermore, this can be abused in several dp problems where we can think of slope trick as $(min,+)$ convolution so we can store a priority queue of slopes such as in SG NOI 2021 Finals password. Another more direct application is CF1609G.

Anyways, we can actually solve $(\min,+)$ convolution quickly with a weaker constraint that only $B$ is convex.

First, let us discuss a $O(N \log N)$ method.

Define a $i$-chain as the line segment with points $\{(i+1,A_i+B_1),(i+2,A_i+B_2),(i+3,A_i+B_3),\ldots \}$. Then I claim that $2$ chains only intersect at a point, which is the sufficient condition for using Li Chao tree to store these lines $^{[7]}$.

Let us prove that this is actually true. Let the equation for the $i$-chain be $f_i(x)$, so $f_i(x)$ is a convex function. We want to show that for $g(x)=f_i(x)-f_j(x)$, there exists $k$ such that $g(x)<0$ for $x<k$ and $0 \leq g(x)$ for $k \leq x$ whenever $i<j$.

$g(x)=(A_i+B_{x-i})-(A_j+B_{x-j})=(B_{x-i}-B_{x-j})+(A_i-A_j)$.

Consider $g(x+1)-g(x)=(B_{x+1-i}-B_{x-i})-(B_{x+1-j}-B_{x-j})$. Recall that $B$ is convex (i.e. $B_{x+1}-B_x \geq B_{x}-B_{x-1}$). Since we have assumed that $i<j$, we can conclude that $(B_{x+1-i}-B_{x-i}) \geq (B_{x+1-j}-B_{x-j})$. Therefore, $g(x+1)-g(x) \geq 0$, i.e. $g$ is a (non-strict) increasing function.

Therefore, we can insert all chains into a Li Chao Tree and find the convolution in $O(N \log N)$.

Now, we will consider an $O(N)$ method. We will have to use the SMAWK algorithm $^{[8]}$ which I will not explain because the reference does a better job at this than me. Here I will actually use $(\max,+)$ convolution to explain.

Anyways, the result we will be using from SMAWK is that given an $N \times M$ matrix $X$ that is totally monotone, we can find the minimum value in each row.

A matrix is totally monotone if for each $2 \times 2$ sub-matrix is monotone i.e. for $\begin{pmatrix} a & b \\ c & d \end{pmatrix}$:

• If $c<d$, then $a<b$
• If $c = d$ then $a \leq b$

The way I think about this is consider the function $\sigma(x)=\begin{cases} 1, & \text{if } x > 0 \\ 0, & \text{if } x = 0 \\ -1 , & \text{if } x < 0 \end{cases}$. We pick $2$ columns $1 \leq i < j \leq M$ and consider writing down $[\sigma(H_{1,i}-H_{1,j}), \sigma(H_{2,i}-H_{2,j}), \ldots, \sigma(H_{N,i}-H_{N,j})]$. Then this sequence should be increasing when $i<j$.

Now given the arrays $A$ and $B$, we define a $2N-1 \times N$ matrix $X$ such that $X_{i,j}=A_{i}+B_{i-j}$. It is clear that the row minimas of $X$ are the answers we want. It can be shown that if $B$ is convex, $X$ is totally monotone $^{[9]}$. I would just remark the proof is basically the same as the proof writen for the case on Li Chao Trees.

Here is a template for SMAWK written by tfg for $(\max,+)$ convolution with a single concave array.

code

## Knapsack Speedup 2

Now, we can talk about which special case of knapsack we can speed up.

There are $N$ items. The $i$-th item has weight $w_i$ and value $v_i$. Find a set $S$ such that $\sum\limits_{i \in S} w_i \leq C$ and $\sum\limits_{i \in S} v_i$ is maximized.

Furthermore, the number of distinct weights is $D$. Then, we have a $O(DC)$ solution $^{[9]}$. Usually $D=O(N)$, but maybe there are some special conditions such as $w_i$ being small, such that we can bound $D$.

We will start with $D=1$. This has an obvious greedy solution of putting the largest value first. Let's suppose the values are $v_1,v_2,\ldots$ with $v_1 \geq v_2 \geq \ldots$ and they all have weight $w$. To make the sum of items become $kw$, the answer is $\sum\limits_{i=1}^k v_i$.

Therefore, it is easy to extend this to $O(DC)$ by performing $(\max,+)$ convolution with $B=[0,v_1,v_1+v_2,\ldots]$ on each residue class modulo $w_i$. We will perform $w_i$ convolutions and each convolution will take $O(\frac{C}{w_i})$ time since $B$ is concave and we are doing $(\max,+)$ convolutions. So each distinct weight we only need $O(C)$ time to process it.

Consider there to be $4$ items. $w=\{2,2,3,3,3\}$ and $v=\{5,1,7,10,2\}$. Initially $A=[0,-\infty,-\infty,\ldots]$.

We process $w_i=2$, $B=[0,5,6]$. Then $A=[0,-\infty,5,-\infty,6,-\infty,\ldots]$.

We process $w_i=3$, $B=[0,10,17,19]$. Then $A=[0,-\infty,5,10,6,15,17,16,22,19,\ldots]$.

Please note that we cannot assume that the sequence $[A_i,A_{i+w_i},A_{i+2w_i},\ldots]$ is convex which is shown by the above example.

Problems:

## Knapsack Speedup 3

There are $N$ items. The $i$-th item has weight $w_i \leq D$. Find a multiset $S$ such that $\sum\limits_{i \in S} w_i \leq C$ and $\sum\limits_{i \in S} v_i$ is maximized.

We can solve this in $O(D^2 \log C)$ $^{[9]}$. Note that when we talk about convolution, we are refering to $(\max,+)$ convolution in $O(D^2)$ complexity. (I believe Algorithm 2 in the reference has a slight error and I have emailed the paper authors.)

Let us define $ans_i$ as the optimal answer with sum of weights being $i$. The main observation is that $ans_i=\max\limits_{j+k=i}(ans_j,ans_k)$, but we can actually impose a constraint that $|j-k| \leq D$ without loss of correctness. Suppose that $j-k >D$, then we can "move" an item from $ans_j$ to $ans_k$. This process can be repeated until $|j-k| \leq D$. Actually this can be generalized to $|j-k - \alpha| \leq D$ for some arbitrary $\alpha$ with the exact same constructive proof (of course we will assume reasonable values for $\alpha$).

Therefore, we can convolute $[ans_{i-\frac{D}{2}}, ans_{i-\frac{D}{2}+1}, \ldots, ans_{i+\frac{D}{2}}]$ with $[ans_{j-\frac{D}{2}}, ans_{j-\frac{D}{2}+1}, \ldots, ans_{j+\frac{D}{2}}]$ to get the value of $ans_{i+j}$.

From this, we have several extensions. We can convolute $[ans_{T-D}, ans_{T-D+1}, \ldots, ans_{T}]$ with $[ans_0,ans_1,\ldots,ans_{2D}]$ to get the values for $[ans_{T+1}, ans_{T+2}, \ldots, ans_{T+D}]$ (there are probably many off by one errors so just pad them with $\pm 5$ or something). We can also convolute $[ans_{T-D}, ans_{T-D+1}, \ldots, ans_{T+D}]$ with itself to get the values for $[ans_{2T-D}, ans_{2T-D+1}, \ldots, ans_{2T}]$.

We can get the array $[ans_0,ans_1,\ldots,ans_{2D}]$ in $O(D^2)$ by using the normal knapsack dp.

Since we have a method of "doubling", which allows us to obtain a complexity of $O(D^2 \log C)$ to compute the array $[ans_{C-D},ans_{C-D+1},\ldots,ans_{C}]$, which is sufficient to find our answer.

Problems:

# References

• +345

By errorgorn, 13 months ago,

As part of the graduation requirements for my school, I have to complete a simple research project, so I decided to do something related to data structure and algorithms. I believe I have come out with a data structure that maintains the basis of vectors in $(\mathbb{Z}/m\mathbb{Z})^d$, where $m$ may not be prime. Since this was related to competitive programming, I think it is a good idea to share it here. Hopefully, this algorithm is actually novel :P

I would like to thank:

• icypiggy for being my research mentor and tolerating my dumb questions

Please comment under the blog or message me on codeforces if any parts are unclear or wrong. Also, I hope that some LGMs can help solve the open problems in this blog.

# Introduction

Maintaining the basis of vectors in $(\mathbb{Z}/2 \mathbb{Z})^d$, also known as the xor basis algorithm is a well-studied algorithm in competitive programming. The standard xor basis algorithm supports $2$ operations. Given a set $S$ that is initially empty, you can add a vector to $S$ or query whether a vector is representable as a linear combination of vectors in $S$. Using the word RAM model, the basis can be maintained in $O(nd)$ as the addition of vectors in $(\mathbb{Z}/2 \mathbb{Z})^d$ can be done in $O(1)$ time, where $n$ is the number of vectors we add. This algorithm can also be easily modified to handle $2$ additional queries: the lexicographically largest representable vector and the number of representable vectors. This data structure is used in various problems such as AGC45A, UER3C, 1299D - Around the World, 1336E1 - Chiori and Doll Picking (easy version),1427E - Xum and 1466F - Euclid's nightmare.

An extended version of the xor basis algorithm allows for querying the basis set of vectors for a given subarray of an array of vectors. More formally, given an array $A$ where $A_i \in (\mathbb{Z}/2 \mathbb{Z})^d$, we can check if a vector is representable as a linear combination of vectors in $A_l,A_{l+1},\ldots,A_r$. It has appeared in 1100F - Ivan and Burgers and ABC223H which can be solved in $O(nd)$ offline or $O(nd \log n)$ online.

Another known extension of the xor basis algorithm allows one to maintain the basis of vectors in $(\mathbb{Z}/p\mathbb{Z})^d$ where $p$ is prime. It is used in XXII Opencup GP of Siberia F with $p=7$.

In this blog, we will propose an algorithm that can maintain the basis for a set of vectors in $(\mathbb{Z}/m\mathbb{Z})^d$.

GIANT DISCLAIMER: In this blog, the words "vector'' and "basis'' are used very loosely. The word vector is used to refer to an element of a module as $(\mathbb{Z}/m\mathbb{Z})^d$ is not necessarily a vector space. Furthermore, the notion of linear independence is not very well-defined when we are considering a ring. Consider the ring $\mathbb{Z}/4\mathbb{Z}$, then the set $\{2\}$ is not linearly independent because $\lambda=2$ satisfies $\lambda \cdot 2 = 0$ but $\lambda \neq 0$. But the term basis is used because after some restrictions to the set of scalars that are applied to the vectors we store, each set of scalars correspond to a unique "vector'', a fact that will be proven in theorem 5.

More specifically, given an initially empty set $S$ of vectors in $(\mathbb{Z}/m\mathbb{Z})^d$, we can add $n$ vectors with $O(n \cdot d^2 + \Omega(m) \cdot d^3 + d \cdot \log(m))$ pre-processing, where $\Omega(m)$ is the prime omega function that counts the total prime factors of $m$ and check if some vector in $(\mathbb{Z}/m\mathbb{Z})^d$ is can be represented by some linear combination of vectors in $S$ in $O(d^2)$ time online. Additionally, we can also query other things such as the number of distinct vectors we that can be represented as the sum of vectors in $S$ and the lexicographical largest vector that can be represented.

Later, we will extend this idea to handle elements of an arbitrary finite abelian group $G$.

# The Algorithm

The main idea of constructing the linear basis is an extension of the xor basis algorithm where we maintain a set of $d$ vectors whose span is equal to the span of the basis. Similar to the xor basis algorithm we impose an additional constraint over our vectors such that the $i$-th vector we store will have the first $i-1$ dimensions to be $0$.

Although this problem may seem like a trivial extension to the xor basis algorithm, it is not. Consider the ring $(\mathbb{Z}/6\mathbb{Z})^2$ and the vector $(3~1)$. In the xor basis algorithm, checking if a vector is representable is done greedily by doing Gaussian elimination and greedily scalars to multiply to the vectors. So we would expect our basis to store $\begin{pmatrix} 3 & 1 \\ 0 & 0 \end{pmatrix}$. However, if we were to check if the vector $(0~2)$ is in the set, we would fail to produce it.

The way that the linear basis algorithm handles this is to allow a single vector to be propagated down to multiple rows. In the example mentioned above, we would realize that $(3~1) \cdot 2 = (0~2)$. So we would store $\begin{pmatrix} 3 & 1 \\ 0 & 2 \end{pmatrix}$ as our matrix. It should be noted that the row space of the matrix we store may not be linearly independent.

Another notable case to consider is on $\mathbb{Z}/6\mathbb{Z}$. With vectors $(2)$ and $(3)$, it actually spans the entire ring since $\gcd(2,3)=1$.

It should be noted that in the case where $m$ is prime, the algorithm behaves exactly like the generalized version of the xor basis algorithm that works on prime numbers.

Define $V,W$ as $1 \times d$ vectors of elements in $\mathbb{Z}/m\mathbb{Z}$ and $A$ as a $d \times d$ matrix of elements in $\mathbb{Z}/m\mathbb{Z}$ where initially all elements are set to $0$.

With such a construction of the basis, we can easily check if a vector is representable using a simple greedy algorithm.

Note that in the above $2$ algorithms, we are doing all operations modulo $m$ with the exception of $\frac{a}{b}$ and $a \% b$. In the above $2$ pseudo-codes and our proofs below, $\frac{a}{b}$ is used to denote integer division, it is always guaranteed that $b|a$ when $\frac{a}{b}$ is written. $a \% b$ is also used to denote the remainder after integer division of $a$ by $b$. We will also use the notation $a<b$ for elements in $\mathbb{Z}/m\mathbb{Z}$, this should be understood as comparing the integer values of the elements.

Let us first list out some properties of $A$ and $V$.

Lemma 1: Consider the vector $V$ after the $i$-th iteration of the loop at line 5 of Algorithm 1, then for all $1 \leq j \leq i$, $V_j=0$.

proof

Lemma 2: $A_{i,j}=0$ if $i>j$

Lemma 3: If $A_{i,i}=0$, then $A_{i,j}=0$.

Lemma 4: If $A_{i,i} \neq 0$, then $\gcd(A_{i,i},m)=A_{i,i}$.

Let $S$ be a set of vectors in $(\mathbb{Z}/m\mathbb{Z})^d$. Then $span(S)$ is the set of vectors that can be formed by a linear combination of vectors in $S$. More formally, let $S={s_1,s_2, \ldots, s_w}$. Then $V \in span(S)$ iff there exists $\lambda_1, \lambda_2, \ldots,\lambda_w \in (\mathbb{Z}/m\mathbb{Z})$ such that $V=\sum\limits_{k=1}^w \lambda_i s_i$.

Theorem 1: Suppose there exists $\mu_1, \mu_2, \ldots,\mu_d \in (\mathbb{Z}/m\mathbb{Z})$ such that $V=\sum\limits_{k=1}^w \mu_ks_k$ then there exists $\lambda_1, \lambda_2, \ldots, \lambda_d \in (\mathbb{Z}/m\mathbb{Z})$ such that $V=\sum\limits_{k=1}^d \lambda_k A_k$ i.e. the rows of $A$ spans $S$.

proof

Corollary 1: Suppose for some vector $W$, $V=W$ at the $i$-th iteration of the for loop. Then there exists $\lambda_i,\lambda_{i+1}, \ldots,\lambda_d \in (\mathbb{Z}/m\mathbb{Z})$ such that $W=\sum\limits_{k=i}^d \lambda_k A_k$ after $\texttt{Add-Vector}$ has returned.

Theorem 2: After each call to $\texttt{Add-Vector}$, the following holds: for all $i$ such that $A_{i,i} \neq 0$, there exists $\mu_{i+1}',\mu_{i+2}',\ldots,\mu_d' \in (\mathbb{Z}/m\mathbb{Z})$ such that $\frac{m}{A_{i,i}} A_i = \sum\limits_{k=i+1}^d \mu_k' A_k$.

proof

Theorem 3: Suppose for all $V \in span(S)$ there exists $V=\sum\limits_{k=1}^d \mu_kd_k$ where $\mu_1, \mu_2, \ldots,\mu_d \in (\mathbb{Z}/m\mathbb{Z})$ then for all $V' \in span(S)$ there exists $\lambda_1, \lambda_2, \ldots, \lambda_d \in (\mathbb{Z}/m\mathbb{Z})$ that satisfies $V=\sum\limits_{k=1}^d \lambda_k A_k$ and $\lambda_i=0$ if $A_{i,i}=0$ and $0 \leq \lambda_i < \frac{m}{A_{i,i}}$ otherwise.

proof

Theorem 4: $\texttt{Inside-Basis}(V)$ is returns true iff $V \in S$.

proof

It is also straightforward to adapt the algorithm $\texttt{Inside-Basis}$ to find the lexicographically largest vector. We will now show that the matrix $A$ also maintains the number of representable vectors.

Define a function $c(z)=\begin{cases} \frac{m}{z}, & \text{if } z \neq 0 \\ 1, & \text{otherwise} \end{cases}$

Theorem 5: The number of representable vectors is $\prod\limits_{k=1}^d c(A_{k,k})$.

proof

# Complexity

Suppose there are $n$ vectors being added into the basis. We will consider the number of times $\texttt{Add-Vector}$ is called. In 15 to 23 $\texttt{Add-Vector}$ is called twice, let us show that the number of times lines 15 to 23 is executed by bounded by $O(\Omega(m))$.

Let $A_i$ and $A_i'$ be the values of $A_i$ before and after the execution of lines 15 to 23. $A_{i,i}'=\gcd(V_i,A_{i,i})$ so it is easy to see that $A_{i,i}'|A_{i,i}$. Yet, $A_{i,i}' \neq A_{i,i}$ since that will only happen if $V_i = kA_{i,i}$ for some $k$ and lines 12 to 14 will be executed instead. Therefore, we can conclude that $A_{i,i}'|A_{i,i}$ and $A_{i,i}' \neq A_{i,i}$ implying that $\Omega(A_{i,i}')<\Omega(A_{i,i})$. Since $\Omega(A_{i,i})$ can only decrease at most $O(\Omega(m))$ times, the number of times that lines 15 to 23 is executed by bounded by $O(\Omega(m))$.

Therefore, there will only be $O(n+\Omega(m) \cdot d)$ calls to $\texttt{Add-Vector}$. It is easy to see that the time complexity excluding the calls to $\texttt{Extended-Euclidean}$ is $O(n \cdot d^2+\Omega(m) \cdot d^3)$.

It is easy to see that the total calls to $\texttt{Extended-Euclidean}$ will take $O(d \log (m))$, so the total complexity is $O(n \cdot d^2 + \Omega(m) \cdot d^3 + d \cdot \log(m))$.

The complexity of $\texttt{Inside-Basis}$ is trivially $O(d^2)$.

# Extension to Handling Elements of Arbitrary Finite Abelian Group

First, let us consider maintaining the basis for vectors in $G=\mathbb{Z}/m_1\mathbb{Z} \times \mathbb{Z}/m_2\mathbb{Z} \times \ldots \times \mathbb{Z}/m_d\mathbb{Z}$.

Let $L=lcm(m_1,m_2,\ldots,m_d)$. It is clear that the above group is a subset of $(\mathbb{Z}/L\mathbb{Z})^d$, so we can maintain the basis of $G$ using this algorithm.

Upon analysis of the time complexity, it is easy to see that the time complexity is actually $O((n + \sum\limits_{k=1}^d \Omega(m_k)) \cdot d^2 + \sum\limits_{k=1}^d \log(m_k))$ assuming addition and multiplication under $\mathbb{Z}/L\mathbb{Z}$ can be done in $O(1)$.

By the fundamental theorem of finite abelian groups, every finite abelian group $G$ is an internal group direct product of cyclic groups whose orders are prime powers. So, $G=\mathbb{Z}/p_1^{k_1}\mathbb{Z} \times \mathbb{Z}/p_2^{k_2}\mathbb{Z} \times \ldots \times \mathbb{Z}/p_d^{k_d}\mathbb{Z}$ which can be handled easily.

# Generalization to Euclidean Domains

Big thanks to adamant for discussing this with me.

It turns out that we can generalize this algorithm to work for any euclidean domain given that we are able to:

• Do division and the extended euclidean algorithm quickly
• Determine the size of the ideal $(r)$ generated by some element $r \in R$

We obviously have to do division quickly or else we will have a horrible time with things like $\mathbb{Z}[X]$.

As adamant suspected, this algorithm works also on $\mathbb{Z}[i]/p\mathbb{Z}[i]$ since similar to the case of integers, the extended euclidean algorithm works in $O(\log \max(N(a),N(b))$, which does not hold for the general euclidean domain, where $N(a+bi)=a^2+b^2$.

Now, we just have to show that we can compute the size of the ideal generated by $a+bi$ quickly. In the normal linear basis algorithm, it is intuitive to see that the size of the ideal generated by $a$ is $\frac{m}{\gcd(a,m)}$. It seems that this intuition actually carries over to $\mathbb{Z}[i]$ i.e. the size of the ideal is $\frac{N(p)}{N(\gcd(p,a+bi))}$.

I don't really have a good proof of this but I think I have some intuition on why this should work. A proof sketch would probably be looking at these things as groups and considering the size of the coset.

The size of the ideal generated by $a+bi$ is intuitively the same as the ideal generated by $\gcd(p,a+bi)$. Since $\gcd(p,a+bi) \cdot k_1 = a+bi$ for some $k_1$ and $(a+bi) \cdot k_2 = \gcd(p,a+bi)$ since the extended euclidean had found some $(s,t)$ such that $s \cdot (a+bi)+ p \cdot t=\gcd(p,a+bi)$.

Now, I will just ask you to appeal to geometric intuition for this part. Suppose we tile the plane with $pZ[i]$. Let's use $p=5+5i$ here.

Now, I claim that if $\gcd(g,p)=g$, then each "tile" will look the same.

Here are some examples:

$g=1+2i$

$g=3+i$

Some examples of $g$ that are not divisors of $p$ are $g=2+3i$.

Here is what it looks like.

Anyways, we can see that each blue tile takes up $N(g)$ squares. And since the blue tiles are a repeated pattern in the red tiles, we can only look at a single tile. Therefore, we can see that the size of the ideal is $\frac{N(p)}{N(g)}$, which was what we wanted.

It is probably better if you just tried these things yourself so here is the desmos graphs that I used to get the intuitions.

# Open Problems

1. (SOLVED) In line 22 of Algorithm 1, $\texttt{Add-Vector}(\frac{m}{A_{i,i}}A_i)$ is called. This fact is used only in case 3 of theorem 2. Is the algorithm correct without this line? We have not found a set of vectors that causes the resulting matrix $A$ to be different when that line is removed.
2. Is there a fast (offline) algorithm for linear basis that allows for range query similar to 1100F - Ivan and Burgers and ABC223H when the vectors are in $(\mathbb{Z}/m\mathbb{Z})^d$?
3. What is the most general ring on which this algorithm works?

# Implementation

Of course, this is a competitive programming site, so I will also provide a sample implementation in C++. I have also created an example problem in a mashup if you would like to submit (although testcases may not be super strong, if you want to provide stronger testcases please message me and I don't mind adding you on polygon).

Code

• +381

By errorgorn, 14 months ago,

Hello again Codeforces 👋,

We are excited to invite you to TOKI Regular Open Contest #24!

Key details:

Many thanks to:

• prabowo 👨‍❤️‍💋‍👨 for 🕒 coordinating the 👦🌷 contest and translating our 💰🅱 stupid 😕💩 jokes into ➡ Indonesian 🇮🇩.
• icypiggy for helping 💻 us 🅱👍 to prepare testdata.
• Our 💩🏽 army of 😫👩 testers dantoh, tqbfjotld, -rs-, jamessngg, iLoveIOI, pavement for testing the problems and making us delete a problem 😡🤬 giving useful feedback 🥰😇.
• hocky for drawing beautiful diagrams 👌😍 for our problems.
• fushar for the wonderful 🌈 TLX platform and fixing TLX after 🔜💯 we 😀👦 broke it multiple times.

Please 😩 register for 👍 the 🚪🅱 contest, and we hope you will 😘 enjoy 😋💋 the contest! Good 👌 luck 😄 have 😏😝 fun!

P.S. Please note ✏️📔 the unusual 🤪 duration 🕐⏱️🕝 I recommend 👍 not reading 🙈📖 all the problems 💀💀 to lose rating❗❗

UPD: contest is over! (no more emoji spam sadly)

Congratulations to our top 5:

Congratulations to our first solvers:

Rating has been updated.

You can upsolve the problems here.

Editorial is available in the upsolve link.

Thank you for participating and see you in the next contest!

• +227

By errorgorn, 14 months ago,

Here is a simple problem. Count the number of array $A$ of non-negative integers size $N$ such that $\sum\limits_{i=1}^{N} A_i \cdot i=K$. Where $N,K \leq 5000$. Find the answer modulo $1000000007$.

I unfortunately cannot share the link to a submissions page as it is on a private local judge.

Update: I have asked judge devs of the local judge, the compile command is g++ -O2 -o {compiledName} {sourceName} -m64 -static -std=gnu++14 -lm -s -w -Wall -Wshadow -fmax-errors=512

But anyways, here is the problem.

AC code
WA code

For the testcase $N=K=2500$ (which is the partition number for $2500$) the answer should be $729287017$, but the WA code gives $735199643$ (testing on atcoder custom invocation with C++ (GCC 9.2.1) ). This makes zero sense because the only thing different between these two codes is just the commented assert.

After further investigating, I managed to figure out that the problem was something to do with #pragma GCC optimize("O3").

Below is another set of AC and WA codes where the only difference is #pragma GCC optimize("O3").

AC code
WA code

I was super confused when I found out about this so I told kostia244. He told me that perhaps it is overflow. And indeed, after changing int memo[5005] to long long memo[5005], it is AC.

AC code with long long

So now we are both scratching our heads at this weird behaviour. Does anyone have an idea why this happens? Should we include macros in our codes by default now?

Sorry for tagging but ToxicPie9 nor

• +122

By errorgorn, 16 months ago,

I would like to preface this blog by saying this is not a tutorial on FWHT but some (maybe obvious) observations on FWHT that I wanted to share.

Thanks to rama_pang and oolimry for proof reading. Love you guys <3

Of course I (and the proof-readers) are not perfect so if you find anything unclear just tell me in the comments.

We will use the symbols $\&,|,\oplus$ to represent bitwise and, bitwise or, bitwise xor respectively.

## FWHT

Given $2$ arrays $A$ and $B$. We will want to find array $C$ such that $C_i=\sum\limits_{j \star k=i} A_j \cdot B_k$. For our normal FFT, the operation $\star$ is just addition.

When finding fast for FFT, I found library code that does FWHT $^{[1]}$, that is it does the above operation where $\star$ can also be any of the $3$ bitwise operations.

We focus on the case for $\oplus$. We focus even more specifically on the very simple case of $A$ and $B$ being size $2$. We want to convolute $\{p_0,p_1\}$ and $\{q_0,q_1\}$ quickly. The way we do this is find some magic matrix $T$ such that we can just perform the dot product to "convolute" these transformed vectors then apply $T^{-1}$ to get the convoluted vector. We want to convolute these arrays into $\{p_0q_0 + p_1q_1 , p_0q_1 + p_1q_0\}$. So we want to find some invertible matrix $T$ that satisfies $T \Big( \matrix{p_0 \\ p_1} \Big) \cdot T \Big( \matrix{q_0 \\ q_1} \Big)=T \Big( \matrix{p_0q_0+p_1q_1 \\ p_0q_1+p_1q_0} \Big)$.

Let $T=\Big( \matrix{a & b \\ c& d} \Big)$. Expanding these, we obtain $\Big( \matrix{ a^2p_0q_0+abp_0q_1+abp_1q_0+b^2p_1q_1 \\ c^2p_0q_0+cdp_0q_1+cdp_1q_0+d^2p_1q_1} \Big)=\Big( \matrix{ ap_0q_0+bp_0q_1+bp_1q_0+ap_1q_1 \\ cp_0q_0+dp_0q_1+dp_1q_0+cp_1q_1} \Big)$.

We obtain $a^2=a,ab=b,b^2=a$ and similar set of equations for $c,d$. We find that $(a,b)={(0,0),(1,1),(1,-1)}$. But the only way to make a invertible matrix is $T=\Big( \matrix{1 & 1 \\ 1& -1} \Big)$. We also find that $T^{-1}=\frac 12 \Big( \matrix{1 & 1 \\ 1& -1} \Big)$.

We can do a similar thing for $\&$ and $|$ and find that $T=\Big( \matrix{0 & 1 \\ 1 & 1} \Big)$ and $T=\Big( \matrix{1 & 0 \\ 1 & 1} \Big)$ works.

Then we just apply the matrix $T \otimes T \otimes \ldots \otimes T$ to our vector to get it's point-coordinate form $^{[4]}$, where $\otimes$ is the Kronecker product $^{[5],[6]}$. Since we are applying the same operation $T$ to each "layer" of the vector, it makes sense why the Kronecker product is used.

I initially just whacked the math and didn't really pay much attention to it. So here is my attempt at going deeper I suppose.

## Kronecker Product

It turns out that $(A \otimes B)(C \otimes D)=AC \otimes BD$ $^{[5]}$. This gives us a fast way to compute $T \otimes T \otimes \ldots \otimes T$.

We assume that $A,B,C$ are $n \times n$ square matrices and $I_n$ is the $n \times n$ identity matrix.

Claim: $A \otimes B \otimes C = (A \otimes I_n \otimes I_n)(I_n \otimes B \otimes I_n)(I_n \otimes I_n \otimes C)=(I_n \otimes I_n \otimes C)(I_n \otimes B \otimes I_n)(A \otimes I_n \otimes I_n)$.

Proof: Trivial

Corollary: we can switch the order we multiply the matrices $(A \otimes I_n \otimes I_n),(I_n \otimes B \otimes I_n),(I_n \otimes I_n \otimes C)$, (i.e. they are all pairwise commutative).

Corollary: $((A \otimes I_n \otimes I_n)(I_n \otimes B \otimes I_n)(I_n \otimes I_n \otimes C))^{-1}=(A \otimes I_n \otimes I_n)^{-1}(I_n \otimes B \otimes I_n)^{-1}(I_n \otimes I_n \otimes C)^{-1}$.

Notice that these things works for arbitrary finite length $T \otimes T \otimes \ldots \otimes T$ but I will not write it down or it will become notation hell. Also, this works if $A,B,C$ are not of the same size (but are still square matrices).

This gives us a fast way to multiply a matrix by $\underbrace{T \otimes T \otimes \ldots \otimes T}_{k \text{ times}}$ as each element of $(I_n \otimes \ldots \otimes I_n \otimes T \otimes I_n \otimes \ldots \otimes I_n)$ has at most $n^{k+1}$ non-zero element. So multiplying a vector by that matrix only takes $O(n^{k+1})$ time and we are multiplying by $O(k)$ matrices. This is exactly the speedup given by FWHT.

## Generalization of Xor

What I did not realize was that $\oplus$ was literally addition under modulo $2$. That is, we were literally doing a "small" FFT in each layer of our FWHT.

Recall that in FFT, we when we convert a polynomial to the point-coordinate form, we are multiplying the coefficients of the polynomial by the matrix $T=\begin{pmatrix} \omega_n^0 & \omega_n^0 & \omega_n^0 & \ldots & \omega_n^{0} \\ \omega_n^0 & \omega_n^1 & \omega_n^2 & \ldots & \omega_n^{n-1} \\ \vdots & \vdots & \vdots & \ddots & \\ \omega_n^0 & \omega_n^{n-1} & \omega_n^{2n-2} & & \omega_n^{n^2-2n+1} \end{pmatrix}$, where $w_a^b=e^{i\frac{b\tau}{a}}$.

Well, when we are doing convolution with $\oplus$, our matrix is $T=\Big( \matrix{1 & 1 \\ 1& -1} \Big)=\Big( \matrix{\omega_2^0 & \omega_2^0 \\ \omega_2^0 & \omega_2^1} \Big)$. That means we can do our "FWHT" with other things such as addition modulo $3$ with the matrix $T=\begin{pmatrix} \omega_3^0 & \omega_3^0 & \omega_3^0 \\ \omega_3^0 & \omega_3^1 & \omega_3^2 \\ \omega_3^0 & \omega_3^2 & \omega_3^4\end{pmatrix}$. This idea is used in GP of Tokyo 2021 G with addition modulo $7$ used. Note that we naively apply the matrix in $O(b^2)$ if we are dealing with $b$-bit numbers.

## Generalization of And/Or

A way to think about $\&$ and $|$ when we are dealing with $0 \leq a,b < 2$ (i.e. single-bit numbers) is $a \& b=\min(a,b)$ and $a | b = \max (a,b)$. Since these are symmetric we can focus on the case of $|$.

Let's try to generalize this to each bit having digits from $0$ to $2$.

Our matrix will look like $T=\begin{pmatrix} a_0 & b_0 & c_0 \\ a_1 & b_1 & c_1 \\ a_2 & b_2 & c_2 \end{pmatrix}$. If we apply the math we have used above, we find that we have to satisfy the equations $a_i^2=a_i, a_ib_i=b_i^2=b_i,a_ic_i=b_ic_i=c_i^2=c_i$.

So we have $(a_i,b_i,c_i)=\{(1,1,1),(1,1,0),(1,0,0),(0,0,0)\}$. Since we want $T$ to be invertible we can just have $T=\begin{pmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 1 \end{pmatrix}$.

Notice an observation about this $T$, it is just a "staircase" matrix, where the bottom right triangle is filled with $1$ and the rest of the elements are $0$.

Now what happens if we do $T \cdot \begin{pmatrix}a \\ b \\ c\end{pmatrix} = \begin{pmatrix}a \\ a+b \\ a+b+c \end{pmatrix}$. That is, $T$ is literally a "do prefix sum matrix".

So what happens if we consider numbers with $n$ bits, what would multiplying the matrix by $\underbrace{T \otimes T \otimes \ldots \otimes T}_{n \text{ times}}$ do? (Yes, we are considering the case for operator $|$).

Let us try to expand out $T \otimes T \otimes T$.

$T \otimes T \otimes T= \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix} \otimes \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix} \otimes \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix}= \begin{pmatrix} 1 & 0 & 0 & 0 \\ 1 & 1& 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 1& 1 & 1 \end{pmatrix} \otimes \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix}=\begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{pmatrix}$.

This may look like garbage but we will try to compute $(T \otimes T \otimes T) \cdot \begin{pmatrix} 000 \\ 001 \\ 010 \\ 011 \\ 100 \\ 101 \\ 110 \\ 111 \end{pmatrix}=\begin{pmatrix} 000 \\ 000+001 \\ 000+010 \\ 000+001+010+011 \\ 000+100 \\ 000+001+100+101 \\ 000+010+100+110 \\ 000+001+010+011+100+101+110+111 \end{pmatrix}$.

Notice that the effect of $T \otimes T \otimes T$ is exactly SOS. That means, FWHT for $\&$ and $|$ is just applying the zeta transform and the mobius transform in $O(n \cdot 2^n)$ using SOS dp $^{[2],[3]}$, which is the same complexity as the $O(N \cdot \log N)$ we have using FWHT taking $N=2^n$.

So we can generalize FWHT for the $b$-bit version of $\&$ and $|$ using (generalized) SOS dp in $O(n \cdot b^n)$ where the array sizes of the convolution are at most size $b^n$.

The reason for why FWHT for $\&$ and $|$ can be done using subset sum is actually quite intuitive in hindset, so I am quite confused how I have never made the connection before.

We can think of this as some PIE, by focusing on the case of $|$ on binary numbers (for simplicity). Define $\bar A_{mask}=\sum\limits_{sub \subseteq mask} A_{sub}$ and $\bar B_{mask}=\sum\limits_{sub \subseteq mask} B_{sub}$.

Now, by PIE, we know that $C_{mask}=\sum\limits_{sub \subseteq mask} (-1)^{|mask \setminus sub|} \bar A_{sub} \bar B_{sub}$. It is clear that $C=\mu(\bar A \cdot \bar B)=\mu(\zeta(A)\cdot \zeta(B))$.

## Subset Sum Convolution

Now, we can actually use ideas from FWHT to do subset sum convolution, where we want to find $C(mask)=\sum\limits_{sub \subseteq mask} A_{sub} \cdot B_{mask \setminus sub}$ for all $mask$.

Define:

• $\hat A_{i,mask}\begin{cases}A_{mask} &, \text{if } |mask|=i\\ 0 &, \text{otherwise}\end{cases}$
• $\hat B_{i,mask}\begin{cases}B_{mask} &, \text{if } |mask|=i\\ 0 &, \text{otherwise}\end{cases}$

It can be shown that $C(mask)=\mu \Bigg(\sum\limits_{i=0}^{|mask|} \zeta(\hat A_{i}) \cdot \zeta(\hat B_{|mask|-i}) \Bigg)$ and it can be calculated in $O(n^2 \cdot 2^n)$ $^{[3]}$.

Well, we can reason about it very easily now. Since for convolution with $|$, FWHT is same as $\zeta$ and the inverse FWHT is same as $\mu$. So it is obvious why the above equation holds, the only way that some non-zero element in $\hat{A}_{i}$ and $\hat B_{|mask|-i}$ can convolute to an element with $|mask|$ bits is for those element 's bits to be mutually exclusive.

Also note that we can apply the inverse FWHT on the outside of the summation as $\mu(A)+\mu(B)=\mu(A+B)$ since $\mu$ can be thought of as some arbitrary matrix.

Actually, we don't have to restrict ourselves to convolution with $|$ for this task. Convolution with $\oplus$ and $\&$ works fine too.

## More Trash

We can do more weird stuff like performing the operation $\oplus$ on the $0$-th bit, performing $\&$ on the $1$-th bit and performing $|$ on the $2$-th bit.

Recalling that the matrices for these are $T_\oplus=\Big( \matrix{1 & 1 \\ 1 & -1} \Big)$, $T_\&=\Big( \matrix{0 & 1 \\ 1 & 1} \Big)$ and $T_|=\Big( \matrix{1 & 0 \\ 1 & 1} \Big)$ respectively. We just multiply our vector by $T_| \otimes T_\& \otimes T_\oplus$ for the forward transform and do the inverse for the backward transform.

We can also "mix and match" transformations of various sizes as long as we modify the result given in the Kronecker Product section.

As an example, suppose we have a number system of $(a,b)$ where $0 \leq a < 2$ and $0 \leq b < 3$ and we define $(a,b)+(c,d)=(a+c \mod 2, b+d \mod 3)$.

We can do convolution on this by noting that we have $T_2=\Big( \matrix{\omega_2^0 & \omega_2^0 \\ \omega_2^0 & \omega_2^1} \Big)$ and $T_3=\begin{pmatrix} \omega_3^0 & \omega_3^0 & \omega_3^0 \\ \omega_3^0 & \omega_3^1 & \omega_3^2 \\ \omega_3^0 & \omega_3^2 & \omega_3^4\end{pmatrix}$. So the transformation is given by $T_2 \otimes T_3=(T_2 \otimes I_3) (I_2 \otimes T_3)$.

So as long as you can define the transformation matrix, you can combine them and do weird stuff.

## References

• +293

By errorgorn, history, 19 months ago,

Spoiler

• +112

By errorgorn, history, 20 months ago,

## 1526A - Среднее неравенство...

Setter: antontrygubO_o
Preparer: errorgorn

Hint 1
Hint 2
Solution
Code (C++)
Code (Python)

## 1526B - Я ненавижу 1111

Setter: errorgorn
Preparer: errorgorn

Hint 0
Hint 1
Hint 2
Solution
Code (C++)
Code (Python)

## 1526C1 - Настойки (простая версия) and 1526C2 - Настойки (сложная версия)

Setter: errorgorn and oolimry
Preparer: oolimry

Hint 1
Hint 2
Hint 3
Solution
Code (C++)
Code 2 (C++)
Code (Python)

## 1526D - Убить Антона

Setter: errorgorn
Preparer: errorgorn

Hint 1
Solution
Code (C++)
Code (Python)

## 1526E - Oolimry и суффиксный массив

Setter: iLoveIOI
Preparer: iLoveIOI

Hint 1
Solution
Code (C++)
Code (Python)

## 1526F - Запросы медианы

Setter: errorgorn and oolimry
Preparer: errorgorn

Hint 1
Hint 2
Solution
Code (C++)
Code (Python)

## Code Golf

You may have realized that python codes in the editorial are quite short. We actually had a code golf challenge among testers. You are invited to try code golf our problems and put them in the comments. Maybe I will put the shortest codes in the editorial ;)

Rules for code golf:

• any language allowed
• codes will be judged by number of characters
• must get AC in the respective problems

• +199

By errorgorn, history, 20 months ago,

Hello again Codeforces!

errorgorn, oolimry and iLoveIOI are glad to invite you to participate in Codeforces Round #723 (Div. 2) which will be held at 28.05.2021 17:05 (Московское время). The round will be rated for the participants with a rating lower than 2100. Participants from the first division are also welcomed to take part in the competition but it will be unrated for them.

You will be given 2 hours and 30 minutes to solve 6 questions. One of the puzzles is interactive, please read the guide of interactive problems before the contest.

We would like to thank:

We hope you find the bugaboos interesting and fun! Wish you high rating!

Here is scoring distribution because you guys deserve it <3

• 500 — 1000 — (750+1000) — 2250 — 2500 — 3500

Btw, remember to downvote all testers who writes stupid comments to beg for likes.

• +923

By errorgorn, 23 months ago,

## 1491A - K-th Largest Value

Setter: syksykCCC
Prepared by: syksykCCC

Hint 1
Hint 2
Solution
Code (C++)
Code (Python) (vim1729)

## 1491B - Minimal Cost

Setter: syksykCCC
Prepared by: syksykCCC

Hint 1
Hint 2
Solution
Code (C++)

## 1491C - Pekora and Trampoline

Setter: oolimry
Prepared by: errorgorn

Hint 1
Hint 2
Solution
Code (C++)
Code (Python)

## 1491D - Zookeeper and The Infinite Zoo

Setter: errorgorn
Prepared by: errorgorn

Hint 1
Hint 2
Hint 3
Hint 4
Solution
Code (C++)
Code (C++) (gamegame)
Code (Python)

## 1491E - Fib-tree

Setter: Widowmaker
Prepared by: Widowmaker and syksykCCC

Hint
Solution
Code (C++)

## 1491F - Magnets

Setter: 3.141592653 and star_xingchen_c
Prepared by: 3.141592653 and star_xingchen_c

Hint 1
Hint 2
Hint 3
Solution
Code (C++)

## 1491G - Switch and Flip

Setter: errorgorn and oolimry
Prepared by: errorgorn

Hint 1
Hint 2
Hint 3
Solution
Code (C++)
Code (Python)

## 1491H - Yuezheng Ling and Dynamic Tree

Setter: Ynoi
Prepared by: Widowmaker and Ynoi

Hint 1
Solution
Code (C++)

## 1491I - Ruler Of The Zoo

Setter: oolimry
Prepared by: oolimry

As the solution to this problem is very long, the full editorial is split into $4$ parts. If you want to challenge yourself, you can try reading one part at a time and see if you get any inspiration. You can also try to read the specific hints for each part.

Part 1 Hint 1
Part 1
Part 2 Hint 1
Part 2 Hint 2
Part 2 Hint 3
Part 2
Part 3 Hint 1
Part 3 Hint 2
Part 3 Hint 3
Part 3 Hint 4
Part 3 Hint 5
Part 3 Hint 6
Part 3
Part 4
Code (C++)
Code (C++) (errorgorn)

• +255

By errorgorn, 2 years ago,

Hello, Codeforces!

Welcome to the Codeforces Raif Round 1 (Div. 1 + Div. 2) supported by Raiffeisenbank, that will start on Oct/17/2020 16:05 (Moscow time). It will be a combined rated round for both divisions. Note that the start time is unusual.

All problems were authored and prepared by bensonlzl, oolimry, errorgorn, dvdg6566, shenxy13.

Ari gato to:

You will be given 8 problems, one of which would be divided into easy and hard versions, and 150 minutes to solve them.

We hope that statements are short and pretests are strong and that you find the problems interesting! Good luck, have fun and we wish everyone high ratings!

The scoring distribution will be announced closer to the beginning of the round.

Thanks to Raiffeisenbank, winners will get awesome swag:

• 1st-3rd place = Bluetooth speaker

• 4th-10th place = Bumbag

• 11th-20th place = Power Bank

Random 50 participants outside of top-20, who solved at least one problem will receive:

• Thermos Mugs

• Raiffeisenbank t-shirt

We develop a high-frequency trading (HFT) system for equity, currency and derivative markets. Our business edge is in technology. The main goal is to create a top-notch platform based on fundamental and statistical models and machine learning, with low latency and high throughput. The efficiency and scalability of our code give us a competitive advantage. We are passionate about code quality and strive for highest standards in product development.

If you are interested in internship and employment opportunities in the Raiffeisenbank algo-trading team Capital Markets, or want to get in touch with the bank's recruitment , fill out a form below.

FILL OUT FORM →

UPD: Scoring distribution: 500 — 1000 — 1000 — 1500 — 1750 — 1750 — (2250+750) — 4000

UPD2: Editorial out!

UPD 3: First ACs and winners

First ACs

A: 300iq

B: icecuber

E: Errichto

F: fmota

Top 5

Congratulations to everyone!

• +1254

By errorgorn, 3 years ago,

So my O level Chinese exam is in 2 days so I decided to learn a data structure that I can only find resources for in Chinese. I thought I might as well write a tutorial in English.

This data structure is called 析合树, directly translated is cut join tree, but I think permutation tree is a better name. Honestly, after learning about it, it seems like a very niche data structure with very limited uses, but anyways here is the tutorial on it.

Thanks to dantoh and oolimry for helping me proofread.

### Motivation

Consider this problem. We are given a permutation,$P$ of length $n$. A good range is a contiguous subsequence such that $\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i = r-l$. This can be thought of the number of contiguous subsequence such that when we sort the numbers in this subsequence, we get contiguous values. Count the number of good ranges.

Example: $P=\{5,3,4,1,2\}$.

All good ranges are $[1,1], [2,2], [3,3], [4,4], [5,5], [2,3], [4,5], [1,3], [2,5], [1,5]$.

The $O(n^2)$ solution for this is using sparse table and checking every subsequence if it fits the given conditions. But it turns out we can speed this up using permutation tree to $O(n\log{n})$.

### Definitions

A permutation $P$ of length $n$ is defined as:

• $|P|=n$
• $\forall i, P_i \in [1,n]$
• $\nexists i,j \in [1,n], P_i \ne P_j$

A good range is defined as a range, $[l,r]$ such that $\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i = r-l$ or equivalently $\nexists x,z \in [l,r], y \notin [l,r], P_x<P_y<P_z$.

We denote a good range $[l,r]$ of $P$ as $(P, [l,r])$, and also denote the set of all good ranges as $I_g$.

### Permutation Tree

So we want a structure that can store all good ranges efficiently.

Firstly, we can notice something about these good ranges. They are composed by the concatenation of other good ranges.

So the structure of the tree is that a node can have some children and the range of the parent is made up of the concatenation of the children's ranges.

Here is an example permutation. $P=\{9,1,10,3,2,5,7,6,8,4\}$.

As we can see from the above image, every node represents a certain good range, where the values in the node represent the minimum and maximum values contains in this range.

Notice in this data structure, for any 2 nodes $[l_1,r_1]$ and $[l_2,r_2]$, WLOG $l_1 \leq l_2$, either $r_1<l_2$ or $r_2 \leq r_1$.

### Definition of Cut Nodes and Join Nodes

We shall define some terms used in this data structure:

• Node range: For some node $u$, $[u_l,u_r]$ will describe the minimum and maximum value contained in the range the node represents
• Ranges of children: For some non-leaf node $u$, let the array $S_u$ denote the array of the ranges of its children. For example, the root node the above picture, $S_u$ is $\{[9,9],[1,1],[10,10],[2,8]\}$.
• Order of children: For some non-leaf node $u$, we can discretize the ranges in $S_u$. Again using the example of the root node, the order of its children is $\{3,1,4,2\}$, we will call this $D_u$.
• Join node: For some non-leaf node $u$, we call it a join node if $D_u=\{1,2,3,\cdots\}$ or $D_u=\{\cdots,3,2,1\}$. For simplicity we also consider all leaf nodes to be join nodes.
• Cut node: Any node that is not a join node.

### Properties of Cut Nodes and Join Nodes

Firstly, we have this very trivial property. The union of all ranges of children is the node's range. Or in fancy math notation, $\bigcup_{i=1}^{|S_u|} S_u[i]=[u_l,u_r]$.

For a join node $u$, any contiguous subsequence of ranges of its children is a good range. Or, $\forall i,j,1 \leq i \leq j \leq |S_u|, \bigcup_{i=l}^{r} S_u[i]\in I_g$.

For a cut node $u$, the opposite is true. Any contiguous subsequence of ranges of its children larger than 1 is not a good range. Or, $\forall i,j,1 \leq i < j \leq |S_u|, \bigcup_{i=l}^{r} S_u[i]\notin I_g$.

The property of join nodes is not too hard to show by looking at the definition of what a join node is.

But the property of cut nodes is slightly harder to prove. A way to think about this is that for some cut node such that there is a subsequence of ranges bigger than 1 that form a good range, then that subsequence would have formed a range. This is a contradiction.

### Construction of Permutation Tree

Now we will discuss a method to create the Permutation Tree in $O(n\log{n})$. According to a comment by CommonAnts, the creator of this data structure, a $O(n)$ algorithm exists, but I could not find any resources on it.

#### Brief overview of algorithm

We process the permutation from left to right. We will also keep a stack of cut and join nodes that we have processed previously. Now let us consider adding $P_i$ to this stack. We firstly make a new node $[P_i,P_i]$ and call it the node we are currently processing.

1. Check if we can add the currently processed as a child of the node on top of the stack.
2. If we cannot, check if we can make a new parent node (this can either be a cut or join node) such that it contains some suffix of the stack and the current processed node as children.
3. Repeat this process until we cannot do any more operations of type 1 or 2.
4. Finally, push the currently processed node to the stack.

Notice that after processing all nodes, we will only have 1 node left on the stack, which is the root node.

#### Details of the algorithm

For operation 1, if we note that we can only do this if the node on top of the stack is a join node. Because if we can add this as a child to a cut node, then it contradicts the fact that no contiguous subsequence of ranges of children larger than 1 of a cut node can be a good range.

For operation 2, we need a fast way to find if there exists a good range such that we can make a new node from. There are 3 cases:

• We cannot make a new node.
• We can make a new join node. This new node has 2 children.
• We can make a new cut node.

#### Checking if there exists a good range

We have established for a good range $(P,[l,r])$ that $\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i = r-l$.

Since $P$ is a permutation, we also have $\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i \geq r-l$ for all ranges $[l,r]$.

Equivalently, we have $\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i - (r-l) \geq 0$, where we have equality only for good ranges.

Say that we are currently processing $P_i$. We define a value $Q$ for each range $[j,i], Q_j=\max\limits_{j \leq k \leq i} P_k - \min\limits_{j \leq k \leq i} P_k - (i-j),0< j \leq i$. Now we just need to check if there is some $Q_j=0$, where $j$ is not in the current node being processed.

Now we only need to know how to maintain this values of $Q_j$ quickly when we transition from $P_i$ to $P_{i+1}$. We can do this by updating the max and min values every time it changes. How can we do this?

Let's focus on updating the max values since updating the min values are similar. Let's consider when the max value will change. It changes every time $P_{i+1} > \max$. Let us maintain a stack of the values of $\max\limits_{j \leq k \leq i}P_k$, where we will store distinct values only. It can be seen that this stack is monotonically decreasing. When we add a new element to this stack, we will pop all elements in the stack which are smaller than it and update their maximum values using a segment tree range add update. This amortizes to $O(n)$ as each value is pushed into the stack once.

Do note to decrement all $Q_j$ by 1 since we are incrementing $i$ by 1.

Now that we can maintain all values of $Q_j$, we can simply check the minimum value of the range we are interested in using segment tree range minimum queries.

If we can make a new cut node, then we greedily try to make new cut node. We can do this by adding another node from our stack until our new cut node is valid.

Since the above may be confusing, here is a illustration of how the construction looks like.

### Problems using Permutation Tree

Codeforces 526F – Pudding Monsters

Idea
Code

CERC 17 Problem I – Instrinsic Interval

Idea
Code

Codeforces 997E – Good Subsegments

Codeforces 1205F – Beauty of a Permutation

CodeChef – Army of Me

CodeChef – Good Subsequences

• +448

By errorgorn, history, 3 years ago,

Sumitomo Mitsui Trust Bank Programming Contest 2019 has just finished, and this is an unofficial editorial.

Thanks to my friends oolimry and shenxy13 for helping write some of the editorial.

# A — November 30

You can solve it simply by checking for each end date of the Gregorian calendar. However, note that as the second date directly follows the first date (a fact which I think is not clear in the English translation), we can also check whether they're in different months, or whether the second date is the first day of a month. This can be done in constant time.

Code

# B — Tax Rate

We note that there is monotonicty in this question. $\lfloor 1.08(X+1) \rfloor \geqslant \lfloor 1.08X \rfloor$. Hence, we can binary search the answer. When we binary search the value of X(the answer), if $\lfloor 1.08X \rfloor = N$, then we have our answer. Otherwise, we can search higher if $\lfloor 1.08X \rfloor > N$ and search lower otherwise. If we find that no number gives us desired N, then it is impossible.

Code

# C — 100 to 105

We can simply do a 0-infinity knapsack with weights 100,101,...,105 and check if some value is reachable. We get a time complexity of $O(N)$.

Code

# D — Lucky PIN

First, we note that there are $O(N^{3})$ subsequences of the string, so generating all of them and using a set to check for number of distinct subsequences is TLE. However, there are only at most 1000 distinct such subsequences, from 000 to 999. We can linearly scan through the string for each of these possible subsequences to check if it is actually a subsequence of the string in $O(N)$. Thus, this can be solved in $O(1000N)$, which is AC.

Code

# E — Colorful Hats 2

Firstly, we can imagine there are 3 imaginary people standing at the very front, each with a different colour hat. For each person, we consider how many possible people could be the first person in front of him with the same hat colour. If the current person has number X, then the number of ways is:

(no. of ppl with X-1 in front) — (no. of ppl with X in front)

This is because those with X in front of him must be paired with one of the X-1 already, so this reduces our options.

Implementation wise, we can store an array which keeps track of the number of people with no. X who are not paired yet. Initially, all values are 0, except at index -1 with the value of 3. Then when processing the current p[user:AtillaAk]erson X, we multiply the answer by arr[X-1], decrease arr[X-1] by 1, and increase arr[X] by 1.

Code

# F — Interval Running

Firstly, if $T_{1}A_{1}+T_{2}A_{2}=T_{1}B_{1}+T_{2}B_{2}$, the answer is infinity.

Else, WLOG, we let $T_{1}A_{1}+T_{2}A_{2} > T_{1}B_{1}+T_{2}B_{2}$. If $A_{1} > B_{1}$, then Takahashi and Aoki will never meet each other. The answer is 0. Now, we have solved all the corner cases. We shall move on to the general case. We shall call the period $T_{1}+T_{2}$. Now, we shall find the number of periods where Takahashi and Aoki meet each other. If we do some math, we get the number of periods to be $\lceil \frac{T_{1}(B_{1}-A_{1})}{(T_{1}A_{1}+T_{2}A_{2})-(T_{1}B_{1}+T_{2}B_{2})} \rceil$.

The number of times that Takahashi and Aoki meet each other is $2periods-1$ since every period they meet each other twice when Aoki overtakes Takahashi and Takahashi overtakes Aoki. We need to minus 1 since we do not count the first time they meet each other at the very start. We submit this and... WA.

Yes, we need to think about the case where $\frac{T_{1}(B_{1}-A_{1})}{(T_{1}A_{1}+T_{2}A_{2})-(T_{1}B_{1}+T_{2}B_{2})}$ is a whole number. In this case, we need to add one, as there will be one more time where Aoki will meet up with Takahashi but never overtake him. And now we get AC. (Thanks to Atill83 for pointing out the mistake)

Code

• +39

By errorgorn, history, 3 years ago,

Example problem: (I dont have the link as this problem only exists in my school's private judge)

We have an array of n positive intergers and we need to group them into k segments such that each element of the array is in a segment and that all elements in a segment are contigous. The cost of a segment is the sum of the numbers in the segment squared.

We need to minimize the total cost of all the segments.

Example 1, we have n=5 and k=3. Where the array is [1,2,3,4,5]. The optimal answer is by grouping (1+2+3) , (4) and (5) into different segments. The total cost is 6^2 + 4^2 + 5^2 =77.

Example 2, we have n=4, k=2. Where the array is [1,1000000,3,4]. The optimal answer is by grouping (1,1000000) and (3,4) into different segments. The total cost is 1000001^2+7^2=10000200050

Now I asked some people and they said this can be solved by doing the lagrange (or aliens) speedup trick. We shall define a cost C. Then do a convex hull trick where we try to minimize the value of all our segments but we also add C to our cost whenever we make a new segment. Now, if C is INF, then we will only have 1 segment, while if C is 0 we will have n segments. So people claim that we can binary search on C to find some C where there will be K segments existing.

This allowed me to AC the problem, but my solution was not always correct as the test cases were weak. I thought of this test case: n=4, k=3. Where is array is [1,1,1,1] Now, when C is 2, we get 4 segments. But when C is 3, we get 2 segments only. Therefore when I ran my code against this case my code printed 8, even though the answer was (1+1)^2+1^2+1^2=6.

So I think I need someway to iterpolate the lagrange speedup trick. Anyone can help? My code is here.

For my code, the input format is:

n k

a1 a2 a3 a3... an

Where ai is the ith element in the array.

• +31