### errorgorn's blog

By errorgorn, 47 hours ago, Hi everyone! Today nor sir and I would like to talk about generating uniform bracket sequences. Over the past few years when preparing test cases, I have had to generate a uniform bracket sequence a few times. Unfortunately I could not figure out how, so I would like to write a blog about this now. Hopefully this blog would be useful to future problem setters :) Scroll down to the end of the blog to copy our generators.

First, let's define some things. A bracket sequence is a string with $n$ $\texttt{(}$ s and $n$ $\texttt{)}$ s. A balanced brackets sequence is a bracket sequence with the additional constraint that it becomes a valid arithmetic expression when we add some $\texttt{1}$ s and $\texttt{+}$ s. Let $p$ be the prefix sum array of a bracket sequence (we assign $1$ to $\texttt{(}$ and $-1$ to $\texttt{)}$ and take the prefix sum). For example, if our bracket sequence is $\texttt{())()((())}$, then $p=[1,0,-1,0,-1,0,1,2,1,0]$.

An alternate way to think about the prefix sum is to imagine the bracket sequence as a sequence of increasing and decreasing lines. Let's define the badness of a bracket sequence as the number of lines in the above diagram that is below the $0$-line. So, the badness of our bracket sequence is $4$.

Let's call a bracket sequence irreducible if the line touches the $0$-line exactly twice (once at the start and at the end). So, $\texttt{(())}$ and $\texttt{))()((}$ are both irreducible but not $\texttt{()()}$. It is natural to also define a irreducible decomposition of a bracket sequence. $\texttt{())()((())}$ gets decomposed into $\texttt{()}$,$\texttt{)(}$,$\texttt{)(}$,$\texttt{(())}$.

Now, we want to note that all irreducible bracket sequences are either balanced or are some sort of "inverse" of a balanced bracket sequences. By inverse, we can either reverse the entire string or flip every bracket in the string. Please note that these are different. For example, $\texttt{)))(()((}$ can be turned into $\texttt{((())())}$ by flipping every bracket or $\texttt{(()(()))}$ by reversing the substring.

Firstly, let's start with finding a recursive formula for the Catalan numbers. One of the basic recurrences when dealing with bracket sequences is to decompose them into the form of $\texttt{(}X\texttt{)}Y$, where $X$ and $Y$ are both valid bracket sequences (they might be empty). This decomposition is actually unique since $\texttt{(}X\texttt{)}$ can only be our first term in our irreducible decomposition (for us competitive programmers, think of it as dp where our transition is removing the first irreducible substring). We immediately have the recurrence $C_{n+1} = \sum\limits_{i=0}^n C_i C_{n-i}$ for $n > 0$ and a base case of $C_0=1$ for $n=0$. After some manipulation with formal power series $[1,2 \text{ example } 2.3.3]$, you can find that we get the familiar $C_n = \frac{1}{n+1} \binom{2n}{n} = \frac{1}{2n+1} \binom{2n+1}{n}$.

## Mapping 1 Let's take a look at $C_n = \frac{1}{2n+1} \binom{2n+1}{n}$ first. What other sequence can we get the $\binom{2n+1}{n}$ term from? Well, the number of binary strings with $n$ $\texttt{(}$ s and $n+1$ $\texttt{)}$ s. And nicely enough, $2n+1$ is the length of said binary string. Let's define the equivalence class $s=_Rt$ when $s$ is a cyclic shift of $t$. Generally for binary strings, it is not so easy to count the number of elements in someone's equivalence class when we define it this way. But since $\gcd(2n+1,n)=1$ here, all cyclic shifts of a string are distinct and there are exactly $2n+1$ strings in any equivalence class. We are almost at a bijection to Catalan numbers already! We just need to show that there is a bijection between equivalence classes and balanced bracket sequences.

#### Lemma 1: Modified Cycle Lemma

Claim: given a string of $n$ $\texttt{(}$s and $n+1$ $\texttt{)}$s, there exists exactly $1$ cyclic shift where only the last element is $-1$ in its prefix sum.

As an example, consider the bracket sequence $\texttt{)())(()}$. It has $p=[-1,0,-1,-2,-1,0,-1]$. The only cyclic shift that gives us a prefix sum with our condition is $\texttt{(())())}$, which has $p=[1,2,1,0,1,0,-1]$.

The proof of this is not too hard and you should prove it yourself, but for completeness, I will put it here.

proof

In fact, cycle lemma is actually the name of a generalized version of a modification of this lemma (see, for instance, $$ for two nice proofs -- one by arranging the sequence on a circle and removing certain segments, and the other using an idea similar to the prefix sum idea presented above).

There is a interesting problem which is related to this $$ (in Chinese).

translation
solution

## Mapping 2

Thank you nor sir for writing this part and independently discovering it.

The mapping mentioned here was, unfortunately, already present in literature $$ after we checked, and my old proof was quite similar to the one mentioned in the paper. It used the Chung-Feller theorem, which says that the badness of bracket sequences is uniformly distributed (i.e., there are an equal number of strings for each value of badness), and then showed that the mapping was in fact a bijection from the set of strings of a fixed badness to the set of balanced bracket sequences.

However, I came up with a much simpler proof which we will present below along with the intuition behind the construction (we could not find this proof anywhere, so if anyone finds a reference, please let us know in the comments).

The main idea behind what we will do below is the identity $C_n = \frac{1}{n + 1} \binom{2n}{n}$. We try to construct a balanced bracket sequence from a bracket sequence with $n$ opening brackets and $n$ closing brackets, and then try to show that this construction is in fact an $n + 1$ to $1$ mapping from $S_n$ (the set of bracket sequences with $n$ opening brackets and $n$ closing brackets) to $S^*_n$ (the set of balanced bracket sequences of length $2n$).

The first idea is to note that, as mentioned earlier, if a string $S = AB$, where $A$ is the irreducible prefix of $S$, then we can make $A$ an irreducible balanced bracket sequence by either flipping all brackets of $A$ (which corresponds to flipping the corresponding diagram in the $x$-axis) or by reversing the string $A$ (which corresponds to rotating the diagram $180^\circ$).

The first attempt at finding a mapping was to do this: set $f(S) = A'f(B)$, where $A'$ is the string obtained by flipping all brackets in $A$ (or just reversing $A$). However, this will not give us a uniformly random way of constructing balanced bracket sequences, since $\texttt{((()))}$ is the image of $2$ strings under this mapping, and $\texttt{()()()}$ is the image of $8$ strings under this mapping.

So here was the crucial idea: we try to offset this asymmetry by modifying how we handle the case when $A \ne A'$. In this case, $A =\; \texttt{)}C\texttt{(}$ for some $C$, i.e., $S = \texttt{)}C\texttt{(}B$. So in order to try to compensate for the imbalance in distribution, we use $\texttt{(}f(B)\texttt{)}C'$ instead (where $A' = (C')$). As it turns out, this works out perfectly.

Here's the formal construction:

Definition: $f: \cup_{n = 0}^\infty S_n \to \cup_{n = 0}^\infty S^*_n$ is defined as

1. $f(\varepsilon) = \varepsilon$, where $\varepsilon$ is the empty string.
2. $f(AB) = Af(B)$, if $A$ is irreducible and balanced.
3. $f(AB) = (f(B))C'$, if $A$ is irreducible and not balanced, and as a corollary, $A =\; )C($ for some $C$.

Claim: For any string $S^* \in S^*_n$, there are exactly $n + 1$ solutions to $f(S) = S^*$ for $S \in S_n$.

Proof: Let $S^* = A^*B^*$, where $A^*$ is the irreducible prefix of $S^*$. Let $S \in S_n$ be a string satisfying the equation $f(S) = S^*$. We prove this by induction. For $n = 0$, there is nothing to prove. Let's suppose $n > 0$. Then $S$ can be mapped to $S^*$ by using one of the two rules, depending on whether the irreducible prefix of $S$ is a balanced bracket sequence or not:

1. Let $S = AB$, where $A$ is a balanced irreducible prefix of $S$. Then we must have $S^* = f(S) = A f(B)$. Since $A, A^*$ are both balanced irreducible prefixes of $S^*$, they must be equal, and $B^* = f(B)$. The number of solutions in this case is hence equal to $\frac{|B^*|}{2} + 1$ by the induction hypothesis.
2. Let $S = AB$, where $A$ is not balanced, but an irreducible prefix of $S$. Then $A =\; \texttt{)}C\texttt{(}$, and $A' = \texttt{(}C'\texttt{)}$ (as mentioned above). By definition, $S^* = f(S) = \texttt{(}f(B)\texttt{)} C'$. By an analogous argument as above, since $f(B)$ is balanced, $A^* = \texttt{(}f(B)\texttt{)}$ and $B^* = C'$. There are exactly $\frac{|A|^*}{2}$ solutions as earlier.

Overall, there are $\frac{|A^*| + |B^*|}{2} + 1 = n + 1$ solutions to $f(S) = S^*$, as needed.

In fact, if, in the proof, we also keep track of the badness of the strings that generate a given balanced bracket sequence, we can see that there is one input string for each possible value of badness (and this gives an alternative proof of the Chung-Feller theorem).

# Generating Balanced Bracket Sequences

It has been asked before on CodeForces $$ how to generate a random bracket sequence. Judging from the comments, no one has been able to figure out how to generate it quickly (i.e. in linear time). So I will provide the code to generate a uniform balanced bracket sequence. Additionally, I will provide code that generates a random binary tree of $n$ vertices to balanced bracket sequences of size $2n$.

I know that there is also a bijection between bracket sequences and ordered trees, but there are better ways to generate a tree $$. Also, there is also a bijection to full binary tree of $2n+1$ vertices, but there is a easy bijection to binary trees of $n$ vertices noting that the full binary tree of $2n+1$ vertices has $n+1$ leaf nodes. We simply remove all leaves, which leaves (pun not intended) us with a binary tree of $n$ vertices. So the binary tree is the "internal nodes" of the full binary tree.

Also, there is a (rather inelegant) method of generating balanced bracket sequences without using any of the techniques discussed earlier.

One of the first ideas that anyone attempting this task might have is to randomly place $\texttt{(}$ or $\texttt{)}$ in a string unless it is impossible to do so. However, we do not get a uniform distribution. Instead, let us vary the probability that we place $($ or $)$ in accordance to the actual probability that we will see in our required distribution.

Let $C_n^k$ denote the number of bracket sequences of size $2n$ with the first $k$ elements being $\texttt{(}$. From $$, we know that $C_n^k = \frac{k+1}{n+1} \binom{2n-k}{n-k} = \frac{k+1}{2n-k+1} \binom{2n-k+1}{n-k}$. Well, if it isn't the damned modified cycle lemma again (we need to be careful in our proof because not all equivalence classes are the same size now, but the ratio of valid rotations is still same).

Ok, so if we want to make a balanced bracket sequence of length $2n$ and we currently placed $a$ $\texttt{(}$s and $b$ $\texttt{)}$s, then there are $C_{n-b}^{a-b}$ possible balanced bracket sequence that have our current string as a prefix. Out of them, $C_{n-b}^{a+1-b}$ have the next character being $\texttt{(}$. So we simply calculate the probability that we should put $\texttt{(}$ as $\frac{C_{n-b}^{a+1-b}}{C_{n-b}^{a-b}} = \frac{n-a}{2n-a-b} \cdot \frac{a-b+2}{a-b+1}$.

As for the bijection from balanced bracket sequences to binary trees, remember earlier when we showed the recurrence $C_{n+1} = \sum\limits_{i=0}^n C_i C_{n-i}$ by bijecting into pairs $(X,Y)$ such that string is of the form $\texttt{(}X\texttt{)}Y$. This bijection works for a recursive bijection into binary trees too $[\text{folklore}]$. That is, $\texttt{()}$ bijects to the binary tree with $1$ nodes, then for any other bracket sequence, $X$ describes the left child and $Y$ describes the right child.

Ok enough talking, here are the codes (I used testlib $$). They are written by both nor and I.

generator

# A Surprisingly Fast Generation

Thanks to pajenegod for pointing out that this method might be faster than $O(n^2)$.

Let us return to the original reccurence of $\texttt{(}X\texttt{)}Y$. This gives us a natural DnC-esque recurence if we are able to determine how to uniformly choose the correct size for $X$ and $Y$. Recall that $C_{n+1} = \sum\limits_{i=0}^n C_i C_{n-i}$ and that we have the approximation $C_n \approx \frac{4^n}{\sqrt{\pi n^3}}$ $$.

For some bracket sequence of size $2(n+1)$, the probability we recurse into $X$ of size $s$ is $P_s = \frac{C_{s}C_{n-s}}{C_{n+1}}$. Of course, we are not able to store $C_n$ directly, but using the earlier approximation earlier, it makes sense to store $C'_n = \frac{C_n}{4^n}$, which should be at a reasonable size to use doubles. We can calculate $C'_{n+1} = \frac{n+\frac{1}{2}}{n+2} \cdot C'_n$ in linear time. Nicely, $P_s = \frac{C'_{s}C'_{n-s}}{4 C'_{n+1}}$.

Here, we can already bound these types of reccurences by $O(n \log n)$ if we don't care about the distribution of $s$ that we recurse to. We can do this by mapping $P_s$ in the order of $P_0,P_n,P_1,P_{n-1},\ldots$. That is, we generate a random number in $Z=[0,1)$. If $Z < P_0$, we recurse with $s=0$, if $Z < P_0+P_n$, we recurse with $s=n$, etc.

With this method, our time complexity is actually $T(n)=O(\min(a,b))+T(a)+T(b)$. So, we have $T(n)=O(n \log n)$.

But we can do better. Let's look at the distribution of $P_s$ for $n=8$.

$P_0$ $P_1$ $P_2$ $P_3$ $P_4$ $P_5$ $P_6$ $P_7$
$0.3$ $0.092$ $0.059$ $0.049$ $0.049$ $0.059$ $0.092$ $0.3$

We see that the distribution is symmetric (which isn't that important) and that larger values appear on the extreme ends (which is important). This suggests that the $O(\min(a,b))$ terms in our complexity might be small.

Let us try to find an approximation for expected value of $\min(a,b)$. Note that $P_s = \frac{C'_{s}C'_{n-s}}{4 C'_{n+1}} \approx \frac{\sqrt{n^3}}{4 \sqrt{\pi s^3(n-s)^3}}$.

When $s \leq \frac{1}{2} n$, we get $P_s \lessapprox \frac{\sqrt{8}}{4 \sqrt{\pi s^3}} = \frac{1}{\sqrt{2 \pi}} \cdot \frac{1}{\sqrt{s^3}}$.

\begin{align}E(\min(a,b)) &= 0 \cdot P_0 + 0 \cdot P_n + 1 \cdot P_1 + 1 \cdot P_{n-1} + \ldots \\&= 2 \cdot (\sum_{s=0}^{\frac{n}{2}} s \cdot P_s)\\ &\lessapprox \sqrt{\frac{2}{\pi}} \cdot (\sum_{s=0}^{\frac{n}{2}} \frac{s}{\sqrt{s^3}}) \\ &< \sqrt{\frac{2}{\pi}} \cdot (\int_{s=0}^{\frac{n}{2}} \frac{1}{\sqrt{s}} \, ds) \\ &= \sqrt{\frac{2}{\pi}} \cdot \sqrt{2n} \\ &= 2 \sqrt{\frac{n}{\pi}} \end{align}

So this is going to be extremely handwavy, but $E(\min(a,b))$ about $\sqrt{n}$. Our reccurence is going to look like $T(n)=O(\sqrt n) + T(\sqrt n) + T (n - \sqrt n)$. Which is smaller than $T(n)=O(n) + \sqrt n \cdot T(\sqrt n)$. It is not hard to see that $T(n) = O(n \log \log n)$.

I ended up testing this empirically. Here are my results.

$n$ Time Used ($s$) Time Used ($\mu s$) / $n$
$100000$ $0.332$ $3.320$
$200000$ $0.683$ $3.415$
$500000$ $1.718$ $3.436$
$1000000$ $3.571$ $3.571$
$2000000$ $7.166$ $3.583$
$5000000$ $18.229$ $3.660$
$10000000$ $37.082$ $3.708$

I'm using a Ryzen 7 3700X CPU with the compile command g++ xxx.cpp -O2. The code generates a bracket sequence of length $n$ 100 times.

code

Seem like $O(n \log \log n)$ to me. What do you think?

# References By errorgorn, 6 weeks ago, On Apr/23/2022 17:05 (Moscow time), we will host Codeforces Global Round 20.

Note the unusual timing, it is 30 minutes earlier. This is the second round of the 2022 series of Codeforces Global Rounds. The rounds are open and rated for everybody.

The prizes for this round:

• 30 best participants get a t-shirt.
• 20 t-shirts are randomly distributed among those with ranks between 31 and 500, inclusive.

The prizes for the 6-round series in 2022:

• In each round top-100 participants get points according to the table.
• The final result for each participant is equal to the sum of points he gets in the four rounds he placed the highest.
• The best 20 participants over all series get sweatshirts and place certificates.

Thanks to XTX, which in 2022 supported the global rounds initiaitive!

All problems were written and prepared by errorgorn, maomao90 and oolimry.

We would like to thank the following people who made this round possible:

You will have 3 hours to solve 9 problems, one of which is divided into two subtasks.

The scoring distribution is 250-500-750-1000-1500-(1250+1250)-2750-3000-4000. GLHF!

Also, please upvote the blog so that I can get more contribution than antontrygubO_o.

PS: We would like to take this opportunity to congratulate rotavirus or antontrygubX_y on getting married. Wishing you lots of love and happiness.

Edit: Editorial is released. Even if you have solved a problem, we encourage you to read the editorial as you might learn something new from it. Announcement of Codeforces Global Round 20 By errorgorn, 2 months ago, Disclaimer: This blog is entirely my own opinion, please do not get mad at the authors from round 779. If you did not enjoy that round, please do not blame the authors. Personally, I felt that the authors overall did a wonderful job (SPyofgame's div 2F was honestly one of my favourite problems in 2022 so far).

Last week round 779 was held, a common feedback that people seemed to be quite vocal about was that the round was "PermutationForces".    If we look at the actual contest, we do see that problems B, C, D, E all contain the word permutation inside, so it is natural to think that problems are all repetitive. But saying a round is "PermutationForces" is like saying a round is "ArrayForces". Like arrays, permutations are very basic objects that are one of the basic building blocks of competitive programming, claiming that a round has too many problems using permutations is pretty baseless to me.

According to the definition of permutations, a permutation is an array consisting of $n$ distinct integers from $1$ to $n$ in arbitrary order. For example, $[2,3,1,5,4]$ is a permutation, but $[1,2,2]$ is not a permutation ($2$ appears twice in the array) and $[1,3,4]$ is also not a permutation ($n=3$ but there is $4$ in the array).

Sometimes when encountering permutation problems, we only use the definitions of permutation at "face value".

• 1658B - Марин и не взаимно простая перестановка — a permutation of length $n$ consists of distinct integers from $1$ to $n$ in arbitrary order, so there are at most $\frac{n}{k}$ elements which are a multiple of $k$.
• IZhO 2015 Day 1 A — a permutation of length $n$ consists of distinct integers from $1$ to $n$ in arbitrary order, so a straightforward approach is just to do multiset comparison using hashing since the multiset of any permutation of length $n$ is unique.

However, in most permutation problems, we do not have to explicitly care that these values are from $1$ to $n$, if we instead say that they are $n$ distinct real numbers the problem would not change. In fact, in most problems, we do not have to do any arithmetic operations on elements on the array. For example, in the context of the problem given the permutation $[1,2]$, I can completely don't care that $1+1=2$. I only need to know that $1=1$ and $1<2$. Using this, we can replace the array with something like $[e,\tau]$ and although it is false that $e+e=\tau$ (well, unless you are an engineer), it still holds true that $e < \tau$. In fact, in many problems where we only care about the relative position of elements, there is a trick known as "discretization" where we compress an array of large values into an array only containing values in $[1,n]$. I feel that for problems that clearly only care about the relative ordering of elements, problem setters should default to using values in $[1,n]$ except for special cases.

Some examples of problems to illustrate what I am talking about:

• 1658C - Синдзю и потерянная перестановкаpower is defined using the $max$ function, which only cares about the relative ordering of elements. We could possibly have used some array $a$ of length $n$ where all elements are in $[1,10^9]$ instead of a permutation here to prevent the corner case of having only a single $1$ in the array $c$ from happening. If you think about this problem more, it is actually about min stacks (wow! data structure in div 2 C? real?)
• 1658E - Годзё и игра на матрице — not so obvious example that we only care about the relative orderings of the elements here. I suggested that we change the problem to $1 \leq V_{i,j} \leq 10^9$ but I think it suggested it too late and so we left $V$ as a permutation in the end. I suggested that we make it in the range $[1,n^2]$ because of my philosophy regarding not requiring people to discretize and also allowing for $O(n^2)$ solutions, but I think setting larger limits would have been better here because the fact that we only care about relative orderings is completely non-trivial. Again, permutations play a very small role here and should be treated as "an array with distinct elements".
• 1641D - Два массива — notice that the condition for a good pair $(i,j)$ is that $a_{i,1},a_{i,2},\ldots,a_{i,m},a_{j,1},a_{j,2},\ldots,a_{j,m}$ is distinct. That is, we only need the inequality operator — we can discretize without caring about the relative orders of elements as long as our discretization function is a bijection. If you looked at the solution of anyone with $O(\frac{n^2\cdot m}{w})$ complexity, most likely they would have discretized all values in $a$.

Another aspect of permutation is studying cycles. Since the permutation is some arbitrary arrangement of integers from $1$ to $n$, a common way to view that is to think about the graph where the edges are of the form $(i,p_i)$. This graph very nicely has the properties of each edge having in-degree and out-degree of $1$. Also, this gives us a very intuitive sense of what the inverse permutation is — we just flip the edges of the graph we are looking at. It is very common to think about such permutation when the problem asks for something like "sort this sequence and your operation is swapping $2$ elements". The general approach for such problems is that you draw the graph of the permutation and completely forget that the permutation ever existed then proceed to solve the problem on the graph--- the permutation is just a compact way to represent our graph.

Finally, permutations may be used when we need to define how to "shuffle" a sequence. These problems usually have nothing related to permutations other than the fact that it is a useful way to define "shuffling" (see 1658D2 - 388535 (усложненная версия) and 1526D - Убить Антона).

I hope this blog can convince people that just because many problems in a contest references the notion of permutations, that does not mean that the round is all about a single concept. Although it is valid criticism to say a round has too much of some idea, it simply does not work when you only say "permutation". Just take a look at how many different ways we can view permutations under!

## Random Sidetrack

Many times in problems, we have to use a well-known object like a permutation, bitwise operator or a subsequence, we have to paste some definition into the statement. Of course, I feel it is kinda funny that the definition of a permutation has to appear in statements $3$ times in round 779. Furthermore, it is kind of weird to put the definition of a permutation on something like a div 1C. Do you really think the people who are attempting a div 1C really need to know what a permutation is?

I suggest that we instead have a glossary on CodeForces that defines all common terms that are needed in CodeForces, then like the guide to interactive problems, we can just paste that glossary page link into all announcement blogs so beginners can read up on those page for those "technical terms" that frequently appear. We don't need to paste lengthy definitions in statements, less work for everyone! By errorgorn, 2 months ago, 2 years ago, antontrygubO_o wrote a blog about div2ABs where he expressed his opinions that d2ABs should not be about "here is a statement, please implement what is written there". Thanks to him, the quality of d2ABs (and problem quality in general) have certainly improved. However, I still believe that there still quite large differences between how coordinators/problemsetters view d2ABs and how the intended participants view them.

From the survey made by kpw29, we can see that most people agree that most people agree that we should primarily consider the target audience when proposing a task. I too think if a task is boring to div 1 contestants, we should not think of that as a reason to immediately disqualify a problem from being a d2A. I think when people judge the interesting-ness of d2As, they try too hard to make it pleasing to div 1 contestants by having short implementation, short statement (no reading comprehension) and at some cute ideas.

While this mindset works very well for combined rounds where div 1 contestants do need to and forced to solve d2ABs and quickly get ABCD over with, I think we should have a different mindset when considering div 2 problems.

We should not make setting interesting d2ABs our top priority but instead make it setting friendly d2ABs. By friendly d2ABs, it means that the solution is easy to justify (there is no difficult constructive proof of why this random condition is sufficient) and that the statement is without much mathematical constructs, it would be best if a 5 year old can read the statement and be able to understand the problem.

Elaborating on my point about having d2ABs contain some non-trivial ideas, sometimes this ideas take the form of something that is downright unapproachable to newer contestants where for them, it feels like a "guess and check" game where they cannot prove anything they submit and just hope that their guess is good.

I believe that this makes beginners in competitive programming think that solving competitive programming is just having good guessing skills and such observation comes from nothing other than divine sources. I recognize that having a good intuition is a valid strategy to solve competitive programming problems and I do guess observations all the time when solving problems. But we should not colude not having a proof with the complete inability to justify why something should be true (I am looking at you 1672F1 - Перемешивание массива).

We should still strive observations on d2ABs more managable such that when one is able to correctly guess the observation, it would not be too hard for them to figure out why the code they submitted is correct if they wanted to. For some constructive d2ABs where the output is yes/no, it might make sense to ask for participants to print their contsruction too. Although this would make coding much more involved that it has to be, I believe this would be fine for most codeforces rounds.

Furthermore, when people set d2ABs, they sometimes use mathematical constructs that not all beginners would know of. It is not uncommon to see bitwise operations in the statement of a d2A followed by an explanation of what said bitwise operator is. If people see the need to provide a link to an explanation of what a bitwise operator is, then why not just not set a d2A that requires you to understand what a bitwise operator is to even comprehend the statement?

Imagine you are a beginner in competitive programming and you only know some basic language syntax, would you really be interested in caring about random problem where someone calculates some stupid value using some random function? There are billions of d2ABs with $\text{gcd}$, $\text{mex}$ or some bitwise functions. But how many times do you see such functions outside of d2ABs? With $\text{gcd}$ or $\text{mex}$, I think it is very rare. Because most of such problems involving these just use the "fun facts" of these functions.

Let's look at 1325A - ЕхАб И нОд as an example (which antontrygubO_o used as an example of a good d2A). The problem just asks if you know the fun fact that $\gcd(x,1)=1$ and $\text{lcm}(x,1)=x$. Despite myself belonging to the camp of formal statements, I believe that d2ABs should have as natural a statement as possible, in the sense that as much as possible, we should not have random mathematical constructs being the basis of d2ABs. I don't want to solve d2ABs where I have to recall the fun fact that $\gcd(x,1)=1$.

At the end of the day, there are many "factors" that affect the suitability of d2ABs for codeforces contests. I don't think we should sacrifice the enjoyability of d2ABs to the intended participants contestants for the sake of making them "interesting".

Because of the goal of setting d2ABs with the intended audience in mind, I decided to ask a few coordinators and users who are specialist and below to answer questions about their thoughts of d2ABs. Since I was already asking people their opinions on d2ABs, I decided to ask some additional questions that felt interesting to me somewhat related to this topic.

Thank you to Monogon, Um_nik, dario2994, antontrygubO_o, isaf27, 74TrAkToR, darkkcyan, -is-this-fft-, sus, tdpencil, m0fum0fu,SPyofgame and ak2006 for spending their time answering these questions and giving me permission to post it here.

Note: These responses were from a month ago but I procrastinated writing this blog.

#### What is your philosophy on d2ABs?

1. In general I believe the easiest problem should be completely trivial for the intended audience of the contest. A gift, so to speak. I have heard this principle from several seasoned contest organizers. This has several reasons.

1. Someone is much more likely to do another contest if they solved at least one problem, even if it is completely trivial to them.
2. A psychological thing: it builds engagement and commits you to the contest. If you are in a contest and can't solve the first problem relatively quickly, then you will kinda get bored and drift away. Once you solved the easiest problem you are more committed to the contest and are ready to spend more time to solve the rest.
3. On Codeforces, it warps ratings if people don't submit to the easiest problem because they don't have ideas for it.
2. I'm not a fan of "troll" div2ABs, that is, problems that have some weird restriction but some very simple construction. All it does is teach beginners that cp is about guessing.

1. Also, people are often misled by thinking that if a problem solution is simple, it must also be easy to solve. Solution length, unless very excessive, is no argument as to the difficulty of a problem. I'm reminded of a certain local contest where the easiest problem turned out to be pretty hard for the participants, for exactly this reason.
3. Easiest problems should not be annoying or tedious for the same reason as 1b.

4. Making the problem interesting should not come at the expense of any of the above.

Monogon: A div2AB problem should be possible for a programmer with minimal knowledge of algorithms to solve, but still presents some kind of challenge. Maybe some simple observation or a slightly nontrivial implementation will be necessary, but it shouldn't feel like you're mindlessly doing exactly what the statement says if you're a beginner.

dario2994: Good very easy problems are not so different from good hard problems, they are just easier. They should be interesting for everyone, but this is often an impossible requirement to satisfy. If one cannot find an easy problem interesting for everyone, then the priority should be to make it interesting for the target audience, i.e., newcomers in competitive programming. A good very easy problem does not require any knowledge, is not "implement what is described in the statement", is solvable in any language (python I am looking at you). I prefer to give very easy problems involving some computer science instead of something which is solved with a sequence of if-checks or in $O(1)$. I don't have any strong opinion about whether a very easy problem should be trivial to implement or not.

antontrygubO_o: They shouldn't be painful (in the sense that they shouldn't have long statements, be caseworky, or have long implementation), and shouldn't be "implement what you read". The natural statement is preferred. I prefer when there is some cute idea.

isaf27: My philosophy is:

• Perfect d2a should be the problem, that will be solved by all participants of the contest. On the other hand this problem should contain some idea, maybe very very simple and obvious, but idea. I don't like the problems "just write what is asked in the statement" even for d2a.
• d2b should be simple, but not obvious.

74TrAkToR: These are tasks for beginners. I think they should be on the idea. There should be a minimum implementation.

darkkcyan: I think that a round on Codeforces should be fun, and maybe educational for newcomers.

The word "difficulty" is relative. A problem can be very hard for one person, but can be trivial for the others. But when judging a problem by myself, I often measure it by the time I spent to think, as well as implementation time. I wanted that for A the idea should be found in a very short amount of time, if not instantly, at least with my level, and for B might be a bit longer, but not toooo long. But a coordinator is only a barrier, the next barrier is the testing phase, which should be more accurate if we invite a good spectrum of testers.

About the idea. First of all, I don't think div2A should be like Atcoder ABC A, because in Atcoder ABC A, you can do what is asked in the statement. Again, I wanted to round to be fun. The joy of solving puzzle is when you find the right idea, and thinking how smart you were. So a broad answer is just, not to make it too trivial and standard, and the rest is determined by the difficulty. That is the same for B, but yes, we must make B somehow harder. This is still an open question, but I think I should leave it there, because creating a problem is truly a hard task.

#### Why should we care about the quality of d2AB if most div 1 competitors forget them after 5 mins anyways?

antontrygubO_o: Question is weird: most Div1 users forget all problems after 5 minutes. But we should just aim for a better quality of problems.

Still, for me personally, boring/stupid D2A-D2C problems may ruin the round, as I am not expecting anything good in the harder problems. I understand that many Div1 users don't care, but many Div1 users would be happy with standard problems on the harder spots as well, so what?

darkkcyan: I think this is an odd question when asking a question about the easiest problems of a div2 round, made for div2 competitors. Well, it should be fun for div 1 competitors too. And for me, they are warm up problems. And for div 2 competitors, they should be fun, again. For newbies, there are educational values as well. In all cases, these problems are the introduction to the way of thinking and observing when doing contests, and I still think that they are doing a good job on that. I do see that nowadays, div2AB requires more thinking than the previous years, but that is also true for div2C and above. Thus, because they are the very firsts problems of a round, acting like guarding problems for the other problems, their qualities should be in the care as well.

because of the bulk of the participants only solve d2ABs and div1 participants are not the target audience

#### What things do you like/dislike about d2ABs on codeforces?

sus: I like the simplicity and typicallity of div2 A and Bs. Even though they only cover simple topics like math, brute force, and simulation, they allow any beginner cper to practice their problem solving skills. Some of the drawbacks of ABs is that they only cover a small amount of topics, so they might start to get repetetive/boring if a problem setter is feeling lazy and that you can't learn much from them after you get good at solving them. People get carried away trying to get 800 problems solved and only solve ABs trying to reach this goal.

Whenever I need to get ready for a contest, I get warmed up by quickly solving an AB. ABs serve as a stepping stool for beginners and thats what I like about them.

ak2006: I like the fact that div 2 ABs dont require much DSA knowledge and can be solved by someone who hasnt done much CP. I dislike that they might use too much math (number theory, combinatorics) which only math major or minors/math olympiad people may be acquainted with well enough to be able to solve them.

m0fum0fu: I do not like ABs at all, for contestants who cant really tackle harder problems/ spend much more time on them, AB is purely a speed contest. In addition, it's largely affected by internet speed, submission speed, etc, and their weightage is very high. Especially affected when trying to open up problems. (eg, solving AB slower by 1/2mins, can lose to another person even though i solve C faster by 15mins)

SPyofgame: Simple to read, simple to understand statement and testcase, interesting idea, not too focus on a specific type of problem, easy to solve but still require thinking a bit instead of doing without brain

#### Do you think d2ABs must necessarily be short implementation?

tdpencil: I believe d2A should. d2B should probably be more implementation based or should permit longer implementation. Just because newbies and beginners are going to try div 2 A and if they don't know how to code it up fast or they don't know how to write such long code then its going to be challenging for them.

sus: I think i dont care. There are 1 liners, there are 30 liners and there are those in between. Just another problem doesnt make a difference to me.

#### Some high rated users have difficulty differentiating d2As-d2Cs what would be a good rule of thumb to determine the correct position of a easy problem?

dario2994: Gauging the difficulty of problems is hard. It is hard to distinguish d2As from d2Bs as it is hard to distinguish d2Ds from d2Es. The difficulty is on a spectrum and there is not a cheap trick to distinguish. Experience helps and I don't think that high rated users find it harder to distinguish d2As from d2Bs than low rated users. But it might be that high rated users are more aware of their inability to make the distinction compared to low rated users.

antontrygubO_o: It's hard for me to imagine that someone may actually have difficulty with this, to be honest. Look at D2A-Bs of recent contests, you will most likely agree that B is harder than A in most of them.

tl;dr testers exist for a reason

#### Do you prefer div 2 rounds or educational rounds more? Why?

tdpencil: I prefer educational rounds. This is because i can pretty much depend on the difficulty of educational rounds and expect to learn something. Most edus are made by the same people so the quality of problems are about the same. For div2s however, a variety of people make the problems and it can be argued that some are much easier than others, meaning not all div2s are at the same difficulty (on average). I would argue that having div2s be varying difficulties is an issue but thats just my opinion.

ak2006: I prefer regular rounds since they use much less math for some reason — edu rounds arent very educational and arent much DSA based as they should be.

sus: I prefer div 2 rounds because educational rounds are the same as any other round for beginners: every round is an educational oppurtunity when you are new. The only reason I prefer div2s is becuase it is easier to gain rating in them. Sometimes I solve C in div2s which is worth a lot more than AB but in edu rounds C is worth the same.

# Ratings of d2ABs

I decided to ask people to give rate some d2ABs from 1 to 10. These div 2ABs were chosen to get a hopefully diverse set of current meta of d2ABs. But take the numerical rating with a giant pinch of salt.

In the words of dario2994, "it seems to me as "rank from 1 to 10 these apples and these cars according to their color, their speed, their consistency, and your overall feeling". It would be a random number."

Anyways, here are the assigned scores in a table form. I have bolded the scores which were assigned the coordinator of the respective contest (you can see that I'm pretty biased).

1266A 1281A 1339A 1391A 1581A 1583B 1598A 1610A 1634B 1642A
-is-this-fft- 10 8 9 3 5 3 8 2 5 6
antontrygubO_o 9 2 7 5 7 10 7 6 10 7
Um_nik 3 0 7 4 4 6 4 5 2 3
Monogon 5 7 10 6 3 5 9 7 4 6
darkkcyan 7 8 9 5 7 7.5 8 9 9 7
isaf27 10 2 6 5 8 6 9 6 10 4
errorgorn 6 1 10 4 9 7 10 5 3 4
m0fum0fu 1 8 10 2 7 3 8 4 2 9
sus 3 3 8 4 2 7 6 4 1 9
mean 6 4.33 8.44 4.22 5.78 6.06 7.67 5.33 5.11 6.11
stddev 3.28 3.35 1.51 1.20 2.39 2.21 1.80 2 3.62 2.15

From this, we can see what is pretty universally agreed to be good d2ABs (the comments are my opinion):

We can also see what the "controversial" problems are:

Here are the full text comments for each problem if you want to read them.

1266A
1281A
1399A
1391A
1581A
1583B
1598A
1610A
1634B
1642A

# Poll

I think it would be interesting to see how users from various ratings rate each of the 10 d2ABs stated earlier in the blog. So I have created a google form where you can rate these d2ABs. I will release the results after a couple of days.

Maybe I will do some data analysis on the results. I tried to do some data analysis on the scores assigned by coordinators (after normalization) and got this: This chart looks very dumb by all standards but hopefully we can see better trends after I get more data.

Anyways, I encourage members of the community to share their thoughts in the comments about the current direction that d2ABs are heading towards, especially for people who are specialist or below since d2ABs are targeted towards them.

Also, to future problem setters, maybe this will allow you to get more d2ABs accepted by seeing the preferences of your coordinator lol

# Results of Poll

You can access the results of the poll here. I have manually added the scores of those people who I personally asked to send scores and the names of people who have posted their scores in the comments.

I have done some basic data analytics and I believe that the results is quite interesting. For the data analytics part, I used mostly the same procedures as those described by my comment. The only different was that I only did normalization for everyone to have mean=0, because it does not make sense that "1 5 9" would be normalized to the same thing as "8 9 10". Also, I removed entry 5 when I did the analytics because it had an extremely low mean score. When I checked it, it turns out that the person had given $7$ problems a score of $1$, so I just removed it. You can check my jupyter notebook file here.

Anyways, here is the plot on the average scores given to each problem. We can see that 1281A actually has a bimodal distribution — either you love it or hate it. From manually looking at the scores assigned to it, I don't think there is any clear correlation between rating and whether you would like this problem. Suggesting that the idea of d2ABs having "do what author tells you = bad problem" is not agreed upon even on most div 1 users.

Here is the plot of using PCA to squish down $10$ dimensions ($1$ dimension for each score) into $3$ dimension so that we can visualise it better. The third dimension is represented by the size of the point (I hope that this will allow you to imagine the point as being nearer or further away). The points are colored by the rank (or rating) of the user. One of the striking thing is the fact that there is a vague correlation between rating and our correlation on this chart, going from the bottom right corner to the top right corner. I know that I do not have sufficient data from div2 users but I think we can kind of conclude that the opinions of most coordinators on d2ABs differs quite abit from the cluster of div 2 users on the middle right. It seems that -is-this-fft-'s and Monogon's philosophies on div2ABs are more suited for having d2ABs that are targetted towards their intended audience.

Thanks to everyone who submitted to the poll!

By errorgorn, 4 months ago, Recently, there was a local contest which I set a problem for (task 4 in this pdf).

The abridged problem statement is that you choose an array $A$ of length $N$ where the elements are $<2^X$. The grader will pick $[L,R]$ and return $A_L \oplus A_{L+1} \oplus \ldots \oplus A_R$, where $\oplus$ is the bitwise xor operation. You need to find any integer $z$ such that $L \leq z \leq R$.

I set this in contest with the constraints of $N=2^{19}$ and $X=192 \approx \frac{(\log N)^2}{2}$ where you also had to answer queries quickly.

Solution

During testing, oolimry and icypiggy decided it would be funny to write solutions that had $X \approx c \cdot \log N$ (of course they also had to ensure they could answer queries quickly so these are more of describing speedup techniques rather than techniques for reducing number of bits).

oolimry's solution
icypiggy's solution

Anyways, this raises some interesting questions about the minimal $X$ we need if we completely do not care about answering queries quickly. A trivial lower bound is $X \approx \log N$ since we could query $[pos,pos]$ for all values of $pos$. An upper bound is randomly choosing bits. I would think that it is reasonable to assume that the xor-sum of subarrays should be independent and we should require somewhere around $X \approx 4 \cdot \log N$ bits?

Is there a better lower and upper bound $X$ in this problem?

By errorgorn, 5 months ago, I would like to thank:

• tfg for his helpful discussions, digging through papers and implementing SMAWK.

## Prerequisites

Let us first define the classical knapsack, unbounded knapsack and subset sum problems.

#### Knapsack

There are $N$ items. The $i$-th item has weight $w_i$ and value $v_i$. Find a set $S$ such that $\sum\limits_{i \in S} w_i \leq C$ and $\sum\limits_{i \in S} v_i$ is maximized.

#### Unbounded Knapsack

There are $N$ items. The $i$-th item has weight $w_i$ and value $v_i$. Find a multiset $S$ such that $\sum\limits_{i \in S} w_i \leq C$ and $\sum\limits_{i \in S} v_i$ is maximized.

#### Subset Sum

There are $N$ items. The $i$-th item has weight $w_i$. Find a set $S$ such that $\sum\limits_{i \in S} w_i = C$.

You should know how to do both versions of knapsack in $O(NC)$ and subset sum in $O(\frac{NC}{32})$ before reading this blog.

In this blog post, I will just show some results that speed up the above problems for special cases.

## Subset Sum Speedup 1

There are $N$ items. The $i$-th item has weight $w_i$. Let $\sum\limits_{i=1}^N w_i=C$. For each $k \in [1,C]$, can we find a set $S$ such that $\sum\limits_{i \in S} w_i = k$?

It turns out we can do this in $O(\frac{C \sqrt C}{32})$.

Let us first consider an algorithm that seems like it is $O(\frac{C \sqrt C \log C}{32})$. We will group items of equal weights together, so we will get $M$ tuples of the form $(w_i,occ_i)$ where there are $occ_i$ occurrences of $w_i$ in the original items. For each tuple, we will decompose it into powers of $2$. For example, for $(w_i,15)$, we will decompose it into $\{w_i,2w_i,4w_i,8w_i\}$, for $(w_i,12)$, we will decompose it into $\{w_i,2w_i,4w_i,5w_i\}$. If you think about these things as binary numbers, it is not too difficult to see that we will get the same answers when we use these new weights instead of the original weights.

Furthermore, it is well known that if you have some weights that sum to $C$, then there are only $\sqrt{C}$ distinct weights. So we can already determine that $M \leq \sqrt{C}$. Since $occ_i \leq C$, we can figure out that each tuple will contribute $\log C$ elements. Giving up a simple upper bound of $O(\frac{C \sqrt C \log C}{32})$.

However, this is actually bounded by $O(\frac{C \sqrt C}{32})$ $^{}$. Consider the set of tuples that contribute a weight $k \cdot w_i$, we will call this set $S_k$. We want to show that $\sum\limits_{k \geq 1} |S_k| = O( \sqrt C)$. Firstly, we note that most of the new weights will be a multiple of $2^k$ of the original weight, for each tuple, it will only contribute at most $1$ new weight that is not a power of $2$. Therefore, if we can show that $\sum\limits_{k=2^z} |S_k| = O(f(C))$, then $\sum\limits_{k \geq 1} |S_k| = O(f(C)+\sqrt C)$.

It is obvious that $\sum\limits_{i \in S_k} w_i \leq \frac{C}{k}$ and we can conclude that $|S_k| \leq \sqrt\frac{C}{k}$.

Therefore, $\sum\limits_{k=2^z} |S_k| \leq \sum\limits_{z \geq 1} \sqrt \frac{C}{2^z} = \sqrt C \left (\sum\limits_{z \geq 1} \frac{1}{\sqrt {2^z}} \right) = O(\sqrt C)$.

However, there is a simpler way to do this $^{}$. Consider processing the weights from smallest to largest, with an initially empty list $W'$ as the list of our new weights. Suppose there are $occ$ occurrences of the smallest weight $w$. If $occ$ is odd, we add $\frac{occ-1}{2}$ occurrences of $2w$ to the original weights and $1$ occurrence of $w$ to $W'$. The case if $occ$ is even is similar, we add $\frac{occ-2}{2}$ occurrences of $2w$ to the original weights and $2$ occurrence of $w$ to $W'$.

It is not hard to see that $|W'| \leq 2 \sqrt C = O(\sqrt C)$, since we only add at most $2$ occurences of some weight to $W'$.

Problems:

## Subset Sum Speedup 2

There are $N$ items. The $i$-th item has weight $w_i \leq D$. Can we find a set $S$ such that $\sum\limits_{i \in S} w_i = C$?

We will solve this in $O(ND)$ $^{}$. Firstly, if $\sum\limits w_i<C$, the answer is obviously no, so we will ignore that case.

Let us find the maximal $k$ such that $\sum\limits_{i=1}^k w_i < C$. The basic idea is that we initially have an answer of $w_1+w_2+\ldots+w_k$, then we can either subtract $w_i$ for $i \leq k$ or add $w_i$ for $i>k$ in some order such that the cost of our items is in the range $[C-D,C+D]$, which is not too hard to show constructively. The proof sketch is just: if the current weight is more than $C$, remove something, otherwise add something.

Let us define $can(total,l,r)$ as a dp function that returns true iff there exists $\lambda_l,\lambda_{l+1}, \ldots, \lambda_r \in [0,1]$ such that $\sum\limits_{i=1}^{l-1}w_i + \sum\limits_{i=l}^{r} \lambda_i w_i = total$, where $total \in [C-D,C+D]$.

Notice that $can(total,l,r) = true$ implies that $can(total,l-1,r)=true$, this means that $can$ is monotone on the dimension $l$. Therefore, let us define a new dp function $dp(total,r)$ that stores the maximal $l$ such that $can(total,l,r)=true$.

Furthermore, $can$ is monotone on the dimension $r$, so $dp(total,r) \leq dp(total,r+1)$.

Let us consider the transitions.

From $dp(total,r)=l$, we can extend $r$, transitioning to $dp(total+w_{r+1},r+1)$ or $dp(total,r+1)$. We can also extend $l$ and transition to $dp(total-w_{l'},r)=l'$ for $l'<l$. However, this transition would be $O(N)$ per state, which is quite bad.

However, it would only make sense to transition to $dp(total-w_{l'},r)=l'$ for $dp(total,r-1) \leq l' < dp(total,r)=l$, otherwise this case would have been covered by another state and there would be no need for this transition. Since $dp(total,r) \leq dp(total,r+1)$, the total number of transitions by extending $l$ is actually bounded by $O(ND)$ using amortized analysis.

Therefore, our dp will have $O(ND)$ states with a total of $O(ND)$ transitions.

Actually there is a scammy way to do this. Similarly, we find the maximal $k$ such that $\sum\limits_{i=1}^k w_i < C$, then we want to solve the subset sum problem with the weights $\{-w_1,-w_2,\ldots,-w_k,w_{k+1},\ldots,w_N\}$ and sum $C- \sum\limits_{i=1}^k w_i$.

Let us randomly shuffle the list of weights and pretend $C- \sum\limits_{i=1}^k w_i$ is very small. Then this can be viewed as a random walk with $0$ sum. The length of each step is bounded by $D$, so we can expect $O(D \sqrt N)$ deviation from the origin (my analysis is very crude). Using bitsets, we can obtain a complexity of $O(\frac{ND \sqrt N}{32})$ which is probably good enough.

The paper also mentions a way to generalize this technique to knapsack where weights are bounded by $D$ and values are bounded by $V$ in $O(NDV)$ but I think it is not a significant speedup compared to the usual knapsack to be used in competitive programming.

Problems:

## Knapsack Speedup 1

Consider SG NOI 2018 Prelim knapsack.

The editorial solution is to only consider $O(S \log S)$ items, since only $\frac{S}{w}$ items for a certain weight $w$ can be used, to get a complexity of $O(S^2 \log S)$. But I will discuss a slower $O(NS)$ solution.

Let $dp(i,j)$ be the maximum value of items if we only consider the first $i$ types using a weight of $j$.

Then our transition would be $dp(i,j)=\max\limits_{0 \leq k \leq K_i}(dp(i-1,j-k\cdot W_i)+k\cdot V_i)$. If we split each $dp(i,j)$ into the residue classes of $j \% W_i$, it is easy to see how to speed this up using the sliding deque trick or using an RMQ.

Now, let us talk about a more general speedup. But first, we will have to introduce a convolution that appears frequently in dp.

## (max,+) convolution

The problem statement is this: Given $2$ arrays $A$ and $B$ of length $N$, find an array $C$ of length $2N-1$ such that $C_i = \max\limits_{j+k=i}(A_j+B_k)$.

Doing this in $O(N^2)$ is easy. Can we do better? It seems that doing this faster is a really hard problem and so far the fastest known algorithm is a very scary $O(\frac{n^2}{\log n})$ or $O(\frac{n^2 (\log \log n)^3}{(\log n)^2})$, which does not seem practical for competitive programming.

Note that $(\max,+)$ and $(\min,+)$ convolution are not too different, we can just flip the sign. In the rest of this section, I will use $(\min,+)$ convolution to explain things.

First, we note that if both arrays $A$ and $B$ are concave, then we can do this convolution in $O(N)$ time $^{,}$. The basic idea is we can consider the union of the slopes of the lines $\{A_i - A_{i-1} \} \cup \{B_i - B_{i-1} \}$ and sort them to get the slopes of $C$. Another way to reason about this is using epigraphs (fancy word for colour everything above the line), which I find more intuitive. Because if we take the epigraph of $A$ and $B$, we get $2$ convex polygons, taking their Minkowski sum gets us the epigraph for $C$ and finding Minkowski sum on convex polygons is well known as taking the union of their edges, which is why this is also known as the Minkowski sum trick.

This speedup can be used in several dp problems such as ABC218H and JOISC18 candies. Furthermore, this can be abused in several dp problems where we can think of slope trick as $(min,+)$ convolution so we can store a priority queue of slopes such as in SG NOI 2021 Finals password. Another more direct application is CF1609G.

Anyways, we can actually solve $(\min,+)$ convolution quickly with a weaker constraint that only $B$ is convex.

First, let us discuss a $O(N \log N)$ method.

Define a $i$-chain as the line segment with points $\{(i+1,A_i+B_1),(i+2,A_i+B_2),(i+3,A_i+B_3),\ldots \}$. Then I claim that $2$ chains only intersect at a point, which is the sufficient condition for using Li Chao tree to store these lines $^{}$. Let us prove that this is actually true. Let the equation for the $i$-chain be $f_i(x)$, so $f_i(x)$ is a convex function. We want to show that for $g(x)=f_i(x)-f_j(x)$, there exists $k$ such that $g(x)<0$ for $x<k$ and $0 \leq g(x)$ for $k \leq x$ whenever $i<j$.

$g(x)=(A_i+B_{x-i})-(A_j+B_{x-j})=(B_{x-i}-B_{x-j})+(A_i-A_j)$.

Consider $g(x+1)-g(x)=(B_{x+1-i}-B_{x-i})-(B_{x+1-j}-B_{x-j})$. Recall that $B$ is convex (i.e. $B_{x+1}-B_x \geq B_{x}-B_{x-1}$). Since we have assumed that $i<j$, we can conclude that $(B_{x+1-i}-B_{x-i}) \geq (B_{x+1-j}-B_{x-j})$. Therefore, $g(x+1)-g(x) \geq 0$, i.e. $g$ is a (non-strict) increasing function.

Therefore, we can insert all chains into a Li Chao Tree and find the convolution in $O(N \log N)$.

Now, we will consider an $O(N)$ method. We will have to use the SMAWK algorithm $^{}$ which I will not explain because the reference does a better job at this than me. Here I will actually use $(\max,+)$ convolution to explain.

Anyways, the result we will be using from SMAWK is that given an $N \times M$ matrix $X$ that is totally monotone, we can find the minimum value in each row.

A matrix is totally monotone if for each $2 \times 2$ sub-matrix is monotone i.e. for $\begin{pmatrix} a & b \\ c & d \end{pmatrix}$:

• If $c<d$, then $a<b$
• If $c = d$ then $a \leq b$

The way I think about this is consider the function $\sigma(x)=\begin{cases} 1, & \text{if } x > 0 \\ 0, & \text{if } x = 0 \\ -1 , & \text{if } x < 0 \end{cases}$. We pick $2$ columns $1 \leq i < j \leq M$ and consider writing down $[\sigma(H_{1,i}-H_{1,j}), \sigma(H_{2,i}-H_{2,j}), \ldots, \sigma(H_{N,i}-H_{N,j})]$. Then this sequence should be increasing when $i<j$.

Now given the arrays $A$ and $B$, we define a $2N-1 \times N$ matrix $X$ such that $X_{i,j}=A_{i}+B_{i-j}$. It is clear that the row minimas of $X$ are the answers we want. It can be shown that if $B$ is convex, $X$ is totally monotone $^{}$. I would just remark the proof is basically the same as the proof writen for the case on Li Chao Trees.

Here is a template for SMAWK written by tfg for $(\max,+)$ convolution with a single concave array.

code

## Knapsack Speedup 2

Now, we can talk about which special case of knapsack we can speed up.

There are $N$ items. The $i$-th item has weight $w_i$ and value $v_i$. Find a set $S$ such that $\sum\limits_{i \in S} w_i \leq C$ and $\sum\limits_{i \in S} v_i$ is maximized.

Furthermore, the number of distinct weights is $D$. Then, we have a $O(DC)$ solution $^{}$. Usually $D=O(N)$, but maybe there are some special conditions such as $w_i$ being small, such that we can bound $D$.

We will start with $D=1$. This has an obvious greedy solution of putting the largest value first. Let's suppose the values are $v_1,v_2,\ldots$ with $v_1 \geq v_2 \geq \ldots$ and they all have weight $w$. To make the sum of items become $kw$, the answer is $\sum\limits_{i=1}^k v_i$.

Therefore, it is easy to extend this to $O(DC)$ by performing $(\max,+)$ convolution with $B=[0,v_1,v_1+v_2,\ldots]$ on each residue class modulo $w_i$. We will perform $w_i$ convolutions and each convolution will take $O(\frac{C}{w_i})$ time since $B$ is concave and we are doing $(\max,+)$ convolutions. So each distinct weight we only need $O(C)$ time to process it.

Consider there to be $4$ items. $w=\{2,2,3,3,3\}$ and $v=\{5,1,7,10,2\}$. Initially $A=[0,-\infty,-\infty,\ldots]$.

We process $w_i=2$, $B=[0,5,6]$. Then $A=[0,-\infty,5,-\infty,6,-\infty,\ldots]$.

We process $w_i=3$, $B=[0,10,17,19]$. Then $A=[0,-\infty,5,10,6,15,17,16,22,19,\ldots]$.

Please note that we cannot assume that the sequence $[A_i,A_{i+w_i},A_{i+2w_i},\ldots]$ is convex which is shown by the above example.

Problems:

## Knapsack Speedup 3

There are $N$ items. The $i$-th item has weight $w_i \leq D$. Find a multiset $S$ such that $\sum\limits_{i \in S} w_i \leq C$ and $\sum\limits_{i \in S} v_i$ is maximized.

We can solve this in $O(D^2 \log C)$ $^{}$. Note that when we talk about convolution, we are refering to $(\max,+)$ convolution in $O(D^2)$ complexity. (I believe Algorithm 2 in the reference has a slight error and I have emailed the paper authors.)

Let us define $ans_i$ as the optimal answer with sum of weights being $i$. The main observation is that $ans_i=\max\limits_{j+k=i}(ans_j,ans_k)$, but we can actually impose a constraint that $|j-k| \leq D$ without loss of correctness. Suppose that $j-k >D$, then we can "move" an item from $ans_j$ to $ans_k$. This process can be repeated until $|j-k| \leq D$. Actually this can be generalized to $|j-k - \alpha| \leq D$ for some arbitrary $\alpha$ with the exact same constructive proof (of course we will assume reasonable values for $\alpha$).

Therefore, we can convolute $[ans_{i-\frac{D}{2}}, ans_{i-\frac{D}{2}+1}, \ldots, ans_{i+\frac{D}{2}}]$ with $[ans_{j-\frac{D}{2}}, ans_{j-\frac{D}{2}+1}, \ldots, ans_{j+\frac{D}{2}}]$ to get the value of $ans_{i+j}$.

From this, we have several extensions. We can convolute $[ans_{T-D}, ans_{T-D+1}, \ldots, ans_{T}]$ with $[ans_0,ans_1,\ldots,ans_{2D}]$ to get the values for $[ans_{T+1}, ans_{T+2}, \ldots, ans_{T+D}]$ (there are probably many off by one errors so just pad them with $\pm 5$ or something). We can also convolute $[ans_{T-D}, ans_{T-D+1}, \ldots, ans_{T+D}]$ with itself to get the values for $[ans_{2T-D}, ans_{2T-D+1}, \ldots, ans_{2T}]$.

We can get the array $[ans_0,ans_1,\ldots,ans_{2D}]$ in $O(D^2)$ by using the normal knapsack dp.

Since we have a method of "doubling", which allows us to obtain a complexity of $O(D^2 \log C)$ to compute the array $[ans_{C-D},ans_{C-D+1},\ldots,ans_{C}]$, which is sufficient to find our answer.

Problems:

# References dp,
By errorgorn, 5 months ago, As part of the graduation requirements for my school, I have to complete a simple research project, so I decided to do something related to data structure and algorithms. I believe I have come out with a data structure that maintains the basis of vectors in $(\mathbb{Z}/m\mathbb{Z})^d$, where $m$ may not be prime. Since this was related to competitive programming, I think it is a good idea to share it here. Hopefully, this algorithm is actually novel :P

I would like to thank:

• icypiggy for being my research mentor and tolerating my dumb questions

Please comment under the blog or message me on codeforces if any parts are unclear or wrong. Also, I hope that some LGMs can help solve the open problems in this blog.

# Introduction

Maintaining the basis of vectors in $(\mathbb{Z}/2 \mathbb{Z})^d$, also known as the xor basis algorithm is a well-studied algorithm in competitive programming. The standard xor basis algorithm supports $2$ operations. Given a set $S$ that is initially empty, you can add a vector to $S$ or query whether a vector is representable as a linear combination of vectors in $S$. Using the word RAM model, the basis can be maintained in $O(nd)$ as the addition of vectors in $(\mathbb{Z}/2 \mathbb{Z})^d$ can be done in $O(1)$ time, where $n$ is the number of vectors we add. This algorithm can also be easily modified to handle $2$ additional queries: the lexicographically largest representable vector and the number of representable vectors. This data structure is used in various problems such as AGC45A, UER3C, 1299D - Around the World, 1336E1 - Chiori and Doll Picking (easy version),1427E - Xum and 1466F - Euclid's nightmare.

An extended version of the xor basis algorithm allows for querying the basis set of vectors for a given subarray of an array of vectors. More formally, given an array $A$ where $A_i \in (\mathbb{Z}/2 \mathbb{Z})^d$, we can check if a vector is representable as a linear combination of vectors in $A_l,A_{l+1},\ldots,A_r$. It has appeared in 1100F - Ivan and Burgers and ABC223H which can be solved in $O(nd)$ offline or $O(nd \log n)$ online.

Another known extension of the xor basis algorithm allows one to maintain the basis of vectors in $(\mathbb{Z}/p\mathbb{Z})^d$ where $p$ is prime. It is used in XXII Opencup GP of Siberia F with $p=7$.

In this blog, we will propose an algorithm that can maintain the basis for a set of vectors in $(\mathbb{Z}/m\mathbb{Z})^d$.

GIANT DISCLAIMER: In this blog, the words "vector'' and "basis'' are used very loosely. The word vector is used to refer to an element of a module as $(\mathbb{Z}/m\mathbb{Z})^d$ is not necessarily a vector space. Furthermore, the notion of linear independence is not very well-defined when we are considering a ring. Consider the ring $\mathbb{Z}/4\mathbb{Z}$, then the set $\{2\}$ is not linearly independent because $\lambda=2$ satisfies $\lambda \cdot 2 = 0$ but $\lambda \neq 0$. But the term basis is used because after some restrictions to the set of scalars that are applied to the vectors we store, each set of scalars correspond to a unique "vector'', a fact that will be proven in theorem 5.

More specifically, given an initially empty set $S$ of vectors in $(\mathbb{Z}/m\mathbb{Z})^d$, we can add $n$ vectors with $O(n \cdot d^2 + \Omega(m) \cdot d^3 + d \cdot \log(m))$ pre-processing, where $\Omega(m)$ is the prime omega function that counts the total prime factors of $m$ and check if some vector in $(\mathbb{Z}/m\mathbb{Z})^d$ is can be represented by some linear combination of vectors in $S$ in $O(d^2)$ time online. Additionally, we can also query other things such as the number of distinct vectors we that can be represented as the sum of vectors in $S$ and the lexicographical largest vector that can be represented.

Later, we will extend this idea to handle elements of an arbitrary finite abelian group $G$.

# The Algorithm

The main idea of constructing the linear basis is an extension of the xor basis algorithm where we maintain a set of $d$ vectors whose span is equal to the span of the basis. Similar to the xor basis algorithm we impose an additional constraint over our vectors such that the $i$-th vector we store will have the first $i-1$ dimensions to be $0$.

Although this problem may seem like a trivial extension to the xor basis algorithm, it is not. Consider the ring $(\mathbb{Z}/6\mathbb{Z})^2$ and the vector $(3~1)$. In the xor basis algorithm, checking if a vector is representable is done greedily by doing Gaussian elimination and greedily scalars to multiply to the vectors. So we would expect our basis to store $\begin{pmatrix} 3 & 1 \\ 0 & 0 \end{pmatrix}$. However, if we were to check if the vector $(0~2)$ is in the set, we would fail to produce it.

The way that the linear basis algorithm handles this is to allow a single vector to be propagated down to multiple rows. In the example mentioned above, we would realize that $(3~1) \cdot 2 = (0~2)$. So we would store $\begin{pmatrix} 3 & 1 \\ 0 & 2 \end{pmatrix}$ as our matrix. It should be noted that the row space of the matrix we store may not be linearly independent.

Another notable case to consider is on $\mathbb{Z}/6\mathbb{Z}$. With vectors $(2)$ and $(3)$, it actually spans the entire ring since $\gcd(2,3)=1$.

It should be noted that in the case where $m$ is prime, the algorithm behaves exactly like the generalized version of the xor basis algorithm that works on prime numbers.

Define $V,W$ as $1 \times d$ vectors of elements in $\mathbb{Z}/m\mathbb{Z}$ and $A$ as a $d \times d$ matrix of elements in $\mathbb{Z}/m\mathbb{Z}$ where initially all elements are set to $0$. With such a construction of the basis, we can easily check if a vector is representable using a simple greedy algorithm. Note that in the above $2$ algorithms, we are doing all operations modulo $m$ with the exception of $\frac{a}{b}$ and $a \% b$. In the above $2$ pseudo-codes and our proofs below, $\frac{a}{b}$ is used to denote integer division, it is always guaranteed that $b|a$ when $\frac{a}{b}$ is written. $a \% b$ is also used to denote the remainder after integer division of $a$ by $b$. We will also use the notation $a<b$ for elements in $\mathbb{Z}/m\mathbb{Z}$, this should be understood as comparing the integer values of the elements.

Let us first list out some properties of $A$ and $V$.

Lemma 1: Consider the vector $V$ after the $i$-th iteration of the loop at line 5 of Algorithm 1, then for all $1 \leq j \leq i$, $V_j=0$.

proof

Lemma 2: $A_{i,j}=0$ if $i>j$

Lemma 3: If $A_{i,i}=0$, then $A_{i,j}=0$.

Lemma 4: If $A_{i,i} \neq 0$, then $\gcd(A_{i,i},m)=A_{i,i}$.

Let $S$ be a set of vectors in $(\mathbb{Z}/m\mathbb{Z})^d$. Then $span(S)$ is the set of vectors that can be formed by a linear combination of vectors in $S$. More formally, let $S={s_1,s_2, \ldots, s_w}$. Then $V \in span(S)$ iff there exists $\lambda_1, \lambda_2, \ldots,\lambda_w \in (\mathbb{Z}/m\mathbb{Z})$ such that $V=\sum\limits_{k=1}^w \lambda_i s_i$.

Theorem 1: Suppose there exists $\mu_1, \mu_2, \ldots,\mu_d \in (\mathbb{Z}/m\mathbb{Z})$ such that $V=\sum\limits_{k=1}^w \mu_ks_k$ then there exists $\lambda_1, \lambda_2, \ldots, \lambda_d \in (\mathbb{Z}/m\mathbb{Z})$ such that $V=\sum\limits_{k=1}^d \lambda_k A_k$ i.e. the rows of $A$ spans $S$.

proof

Corollary 1: Suppose for some vector $W$, $V=W$ at the $i$-th iteration of the for loop. Then there exists $\lambda_i,\lambda_{i+1}, \ldots,\lambda_d \in (\mathbb{Z}/m\mathbb{Z})$ such that $W=\sum\limits_{k=i}^d \lambda_k A_k$ after $\texttt{Add-Vector}$ has returned.

Theorem 2: After each call to $\texttt{Add-Vector}$, the following holds: for all $i$ such that $A_{i,i} \neq 0$, there exists $\mu_{i+1}',\mu_{i+2}',\ldots,\mu_d' \in (\mathbb{Z}/m\mathbb{Z})$ such that $\frac{m}{A_{i,i}} A_i = \sum\limits_{k=i+1}^d \mu_k' A_k$.

proof

Theorem 3: Suppose for all $V \in span(S)$ there exists $V=\sum\limits_{k=1}^d \mu_kd_k$ where $\mu_1, \mu_2, \ldots,\mu_d \in (\mathbb{Z}/m\mathbb{Z})$ then for all $V' \in span(S)$ there exists $\lambda_1, \lambda_2, \ldots, \lambda_d \in (\mathbb{Z}/m\mathbb{Z})$ that satisfies $V=\sum\limits_{k=1}^d \lambda_k A_k$ and $\lambda_i=0$ if $A_{i,i}=0$ and $0 \leq \lambda_i < \frac{m}{A_{i,i}}$ otherwise.

proof

Theorem 4: $\texttt{Inside-Basis}(V)$ is returns true iff $V \in S$.

proof

It is also straightforward to adapt the algorithm $\texttt{Inside-Basis}$ to find the lexicographically largest vector. We will now show that the matrix $A$ also maintains the number of representable vectors.

Define a function $c(z)=\begin{cases} \frac{m}{z}, & \text{if } z \neq 0 \\ 1, & \text{otherwise} \end{cases}$

Theorem 5: The number of representable vectors is $\prod\limits_{k=1}^d c(A_{k,k})$.

proof

# Complexity

Suppose there are $n$ vectors being added into the basis. We will consider the number of times $\texttt{Add-Vector}$ is called. In 15 to 23 $\texttt{Add-Vector}$ is called twice, let us show that the number of times lines 15 to 23 is executed by bounded by $O(\Omega(m))$.

Let $A_i$ and $A_i'$ be the values of $A_i$ before and after the execution of lines 15 to 23. $A_{i,i}'=\gcd(V_i,A_{i,i})$ so it is easy to see that $A_{i,i}'|A_{i,i}$. Yet, $A_{i,i}' \neq A_{i,i}$ since that will only happen if $V_i = kA_{i,i}$ for some $k$ and lines 12 to 14 will be executed instead. Therefore, we can conclude that $A_{i,i}'|A_{i,i}$ and $A_{i,i}' \neq A_{i,i}$ implying that $\Omega(A_{i,i}')<\Omega(A_{i,i})$. Since $\Omega(A_{i,i})$ can only decrease at most $O(\Omega(m))$ times, the number of times that lines 15 to 23 is executed by bounded by $O(\Omega(m))$.

Therefore, there will only be $O(n+\Omega(m) \cdot d)$ calls to $\texttt{Add-Vector}$. It is easy to see that the time complexity excluding the calls to $\texttt{Extended-Euclidean}$ is $O(n \cdot d^2+\Omega(m) \cdot d^3)$.

It is easy to see that the total calls to $\texttt{Extended-Euclidean}$ will take $O(d \log (m))$, so the total complexity is $O(n \cdot d^2 + \Omega(m) \cdot d^3 + d \cdot \log(m))$.

The complexity of $\texttt{Inside-Basis}$ is trivially $O(d^2)$.

# Extension to Handling Elements of Arbitrary Finite Abelian Group

First, let us consider maintaining the basis for vectors in $G=\mathbb{Z}/m_1\mathbb{Z} \times \mathbb{Z}/m_2\mathbb{Z} \times \ldots \times \mathbb{Z}/m_d\mathbb{Z}$.

Let $L=lcm(m_1,m_2,\ldots,m_d)$. It is clear that the above group is a subset of $(\mathbb{Z}/L\mathbb{Z})^d$, so we can maintain the basis of $G$ using this algorithm.

Upon analysis of the time complexity, it is easy to see that the time complexity is actually $O((n + \sum\limits_{k=1}^d \Omega(m_k)) \cdot d^2 + \sum\limits_{k=1}^d \log(m_k))$ assuming addition and multiplication under $\mathbb{Z}/L\mathbb{Z}$ can be done in $O(1)$.

By the fundamental theorem of finite abelian groups, every finite abelian group $G$ is an internal group direct product of cyclic groups whose orders are prime powers. So, $G=\mathbb{Z}/p_1^{k_1}\mathbb{Z} \times \mathbb{Z}/p_2^{k_2}\mathbb{Z} \times \ldots \times \mathbb{Z}/p_d^{k_d}\mathbb{Z}$ which can be handled easily.

# Generalization to Euclidean Domains

Big thanks to adamant for discussing this with me.

It turns out that we can generalize this algorithm to work for any euclidean domain given that we are able to:

• Do division and the extended euclidean algorithm quickly
• Determine the size of the ideal $(r)$ generated by some element $r \in R$

We obviously have to do division quickly or else we will have a horrible time with things like $\mathbb{Z}[X]$.

As adamant suspected, this algorithm works also on $\mathbb{Z}[i]/p\mathbb{Z}[i]$ since similar to the case of integers, the extended euclidean algorithm works in $O(\log \max(N(a),N(b))$, which does not hold for the general euclidean domain, where $N(a+bi)=a^2+b^2$.

Now, we just have to show that we can compute the size of the ideal generated by $a+bi$ quickly. In the normal linear basis algorithm, it is intuitive to see that the size of the ideal generated by $a$ is $\frac{m}{\gcd(a,m)}$. It seems that this intuition actually carries over to $\mathbb{Z}[i]$ i.e. the size of the ideal is $\frac{N(p)}{N(\gcd(p,a+bi))}$.

I don't really have a good proof of this but I think I have some intuition on why this should work. A proof sketch would probably be looking at these things as groups and considering the size of the coset.

The size of the ideal generated by $a+bi$ is intuitively the same as the ideal generated by $\gcd(p,a+bi)$. Since $\gcd(p,a+bi) \cdot k_1 = a+bi$ for some $k_1$ and $(a+bi) \cdot k_2 = \gcd(p,a+bi)$ since the extended euclidean had found some $(s,t)$ such that $s \cdot (a+bi)+ p \cdot t=\gcd(p,a+bi)$.

Now, I will just ask you to appeal to geometric intuition for this part. Suppose we tile the plane with $pZ[i]$. Let's use $p=5+5i$ here. Now, I claim that if $\gcd(g,p)=g$, then each "tile" will look the same.

Here are some examples:

$g=1+2i$ $g=3+i$ Some examples of $g$ that are not divisors of $p$ are $g=2+3i$.

Here is what it looks like. Anyways, we can see that each blue tile takes up $N(g)$ squares. And since the blue tiles are a repeated pattern in the red tiles, we can only look at a single tile. Therefore, we can see that the size of the ideal is $\frac{N(p)}{N(g)}$, which was what we wanted.

It is probably better if you just tried these things yourself so here is the desmos graphs that I used to get the intuitions.

# Open Problems

1. (SOLVED) In line 22 of Algorithm 1, $\texttt{Add-Vector}(\frac{m}{A_{i,i}}A_i)$ is called. This fact is used only in case 3 of theorem 2. Is the algorithm correct without this line? We have not found a set of vectors that causes the resulting matrix $A$ to be different when that line is removed.
2. Is there a fast (offline) algorithm for linear basis that allows for range query similar to 1100F - Ivan and Burgers and ABC223H when the vectors are in $(\mathbb{Z}/m\mathbb{Z})^d$?
3. What is the most general ring on which this algorithm works?

# Implementation

Of course, this is a competitive programming site, so I will also provide a sample implementation in C++. I have also created an example problem in a mashup if you would like to submit (although testcases may not be super strong, if you want to provide stronger testcases please message me and I don't mind adding you on polygon).

Code By errorgorn, 5 months ago, Hello again Codeforces 👋,

We are excited to invite you to TOKI Regular Open Contest #24!

Key details:

Many thanks to:

• prabowo 👨‍❤️‍💋‍👨 for 🕒 coordinating the 👦🌷 contest and translating our 💰🅱 stupid 😕💩 jokes into ➡ Indonesian 🇮🇩.
• icypiggy for helping 💻 us 🅱👍 to prepare testdata.
• Our 💩🏽 army of 😫👩 testers dantoh, tqbfjotld, -rs-, jamessngg, iLoveIOI, pavement for testing the problems and making us delete a problem 😡🤬 giving useful feedback 🥰😇.
• hocky for drawing beautiful diagrams 👌😍 for our problems.
• fushar for the wonderful 🌈 TLX platform and fixing TLX after 🔜💯 we 😀👦 broke it multiple times.

Please 😩 register for 👍 the 🚪🅱 contest, and we hope you will 😘 enjoy 😋💋 the contest! Good 👌 luck 😄 have 😏😝 fun!

P.S. Please note ✏️📔 the unusual 🤪 duration 🕐⏱️🕝 I recommend 👍 not reading 🙈📖 all the problems 💀💀 to lose rating❗❗

UPD: contest is over! (no more emoji spam sadly)

Congratulations to our top 5:

Congratulations to our first solvers:

Rating has been updated.

You can upsolve the problems here.

Editorial is available in the upsolve link.

Thank you for participating and see you in the next contest! tlx, troc, toki,
By errorgorn, 6 months ago, Here is a simple problem. Count the number of array $A$ of non-negative integers size $N$ such that $\sum\limits_{i=1}^{N} A_i \cdot i=K$. Where $N,K \leq 5000$. Find the answer modulo $1000000007$.

I unfortunately cannot share the link to a submissions page as it is on a private local judge.

Update: I have asked judge devs of the local judge, the compile command is g++ -O2 -o {compiledName} {sourceName} -m64 -static -std=gnu++14 -lm -s -w -Wall -Wshadow -fmax-errors=512

But anyways, here is the problem.

AC code
WA code

For the testcase $N=K=2500$ (which is the partition number for $2500$) the answer should be $729287017$, but the WA code gives $735199643$ (testing on atcoder custom invocation with C++ (GCC 9.2.1) ). This makes zero sense because the only thing different between these two codes is just the commented assert.

After further investigating, I managed to figure out that the problem was something to do with #pragma GCC optimize("O3").

Below is another set of AC and WA codes where the only difference is #pragma GCC optimize("O3").

AC code
WA code

I was super confused when I found out about this so I told kostia244. He told me that perhaps it is overflow. And indeed, after changing int memo to long long memo, it is AC.

AC code with long long

So now we are both scratching our heads at this weird behaviour. Does anyone have an idea why this happens? Should we include macros in our codes by default now?

Sorry for tagging but ToxicPie9 nor By errorgorn, 7 months ago, I would like to preface this blog by saying this is not a tutorial on FWHT but some (maybe obvious) observations on FWHT that I wanted to share.

Thanks to rama_pang and oolimry for proof reading. Love you guys <3

Of course I (and the proof-readers) are not perfect so if you find anything unclear just tell me in the comments. We will use the symbols $\&,|,\oplus$ to represent bitwise and, bitwise or, bitwise xor respectively.

## FWHT

Given $2$ arrays $A$ and $B$. We will want to find array $C$ such that $C_i=\sum\limits_{j \star k=i} A_j \cdot B_k$. For our normal FFT, the operation $\star$ is just addition.

When finding fast for FFT, I found library code that does FWHT $^{}$, that is it does the above operation where $\star$ can also be any of the $3$ bitwise operations.

We focus on the case for $\oplus$. We focus even more specifically on the very simple case of $A$ and $B$ being size $2$. We want to convolute $\{p_0,p_1\}$ and $\{q_0,q_1\}$ quickly. The way we do this is find some magic matrix $T$ such that we can just perform the dot product to "convolute" these transformed vectors then apply $T^{-1}$ to get the convoluted vector. We want to convolute these arrays into $\{p_0q_0 + p_1q_1 , p_0q_1 + p_1q_0\}$. So we want to find some invertible matrix $T$ that satisfies $T \Big( \matrix{p_0 \\ p_1} \Big) \cdot T \Big( \matrix{q_0 \\ q_1} \Big)=T \Big( \matrix{p_0q_0+p_1q_1 \\ p_0q_1+p_1q_0} \Big)$.

Let $T=\Big( \matrix{a & b \\ c& d} \Big)$. Expanding these, we obtain $\Big( \matrix{ a^2p_0q_0+abp_0q_1+abp_1q_0+b^2p_1q_1 \\ c^2p_0q_0+cdp_0q_1+cdp_1q_0+d^2p_1q_1} \Big)=\Big( \matrix{ ap_0q_0+bp_0q_1+bp_1q_0+ap_1q_1 \\ cp_0q_0+dp_0q_1+dp_1q_0+cp_1q_1} \Big)$.

We obtain $a^2=a,ab=b,b^2=a$ and similar set of equations for $c,d$. We find that $(a,b)={(0,0),(1,1),(1,-1)}$. But the only way to make a invertible matrix is $T=\Big( \matrix{1 & 1 \\ 1& -1} \Big)$. We also find that $T^{-1}=\frac 12 \Big( \matrix{1 & 1 \\ 1& -1} \Big)$.

We can do a similar thing for $\&$ and $|$ and find that $T=\Big( \matrix{0 & 1 \\ 1 & 1} \Big)$ and $T=\Big( \matrix{1 & 0 \\ 1 & 1} \Big)$ works.

Then we just apply the matrix $T \otimes T \otimes \ldots \otimes T$ to our vector to get it's point-coordinate form $^{}$, where $\otimes$ is the Kronecker product $^{,}$. Since we are applying the same operation $T$ to each "layer" of the vector, it makes sense why the Kronecker product is used.

I initially just whacked the math and didn't really pay much attention to it. So here is my attempt at going deeper I suppose.

## Kronecker Product

It turns out that $(A \otimes B)(C \otimes D)=AC \otimes BD$ $^{}$. This gives us a fast way to compute $T \otimes T \otimes \ldots \otimes T$.

We assume that $A,B,C$ are $n \times n$ square matrices and $I_n$ is the $n \times n$ identity matrix.

Claim: $A \otimes B \otimes C = (A \otimes I_n \otimes I_n)(I_n \otimes B \otimes I_n)(I_n \otimes I_n \otimes C)=(I_n \otimes I_n \otimes C)(I_n \otimes B \otimes I_n)(A \otimes I_n \otimes I_n)$.

Proof: Trivial

Corollary: we can switch the order we multiply the matrices $(A \otimes I_n \otimes I_n),(I_n \otimes B \otimes I_n),(I_n \otimes I_n \otimes C)$, (i.e. they are all pairwise commutative).

Corollary: $((A \otimes I_n \otimes I_n)(I_n \otimes B \otimes I_n)(I_n \otimes I_n \otimes C))^{-1}=(A \otimes I_n \otimes I_n)^{-1}(I_n \otimes B \otimes I_n)^{-1}(I_n \otimes I_n \otimes C)^{-1}$.

Notice that these things works for arbitrary finite length $T \otimes T \otimes \ldots \otimes T$ but I will not write it down or it will become notation hell. Also, this works if $A,B,C$ are not of the same size (but are still square matrices).

This gives us a fast way to multiply a matrix by $\underbrace{T \otimes T \otimes \ldots \otimes T}_{k \text{ times}}$ as each element of $(I_n \otimes \ldots \otimes I_n \otimes T \otimes I_n \otimes \ldots \otimes I_n)$ has at most $n^{k+1}$ non-zero element. So multiplying a vector by that matrix only takes $O(n^{k+1})$ time and we are multiplying by $O(k)$ matrices. This is exactly the speedup given by FWHT.

## Generalization of Xor

What I did not realize was that $\oplus$ was literally addition under modulo $2$. That is, we were literally doing a "small" FFT in each layer of our FWHT.

Recall that in FFT, we when we convert a polynomial to the point-coordinate form, we are multiplying the coefficients of the polynomial by the matrix $T=\begin{pmatrix} \omega_n^0 & \omega_n^0 & \omega_n^0 & \ldots & \omega_n^{0} \\ \omega_n^0 & \omega_n^1 & \omega_n^2 & \ldots & \omega_n^{n-1} \\ \vdots & \vdots & \vdots & \ddots & \\ \omega_n^0 & \omega_n^{n-1} & \omega_n^{2n-2} & & \omega_n^{n^2-2n+1} \end{pmatrix}$, where $w_a^b=e^{i\frac{b\tau}{a}}$.

Well, when we are doing convolution with $\oplus$, our matrix is $T=\Big( \matrix{1 & 1 \\ 1& -1} \Big)=\Big( \matrix{\omega_2^0 & \omega_2^0 \\ \omega_2^0 & \omega_2^1} \Big)$. That means we can do our "FWHT" with other things such as addition modulo $3$ with the matrix $T=\begin{pmatrix} \omega_3^0 & \omega_3^0 & \omega_3^0 \\ \omega_3^0 & \omega_3^1 & \omega_3^2 \\ \omega_3^0 & \omega_3^2 & \omega_3^4\end{pmatrix}$. This idea is used in GP of Tokyo 2021 G with addition modulo $7$ used. Note that we naively apply the matrix in $O(b^2)$ if we are dealing with $b$-bit numbers.

## Generalization of And/Or

A way to think about $\&$ and $|$ when we are dealing with $0 \leq a,b < 2$ (i.e. single-bit numbers) is $a \& b=\min(a,b)$ and $a | b = \max (a,b)$. Since these are symmetric we can focus on the case of $|$.

Let's try to generalize this to each bit having digits from $0$ to $2$.

Our matrix will look like $T=\begin{pmatrix} a_0 & b_0 & c_0 \\ a_1 & b_1 & c_1 \\ a_2 & b_2 & c_2 \end{pmatrix}$. If we apply the math we have used above, we find that we have to satisfy the equations $a_i^2=a_i, a_ib_i=b_i^2=b_i,a_ic_i=b_ic_i=c_i^2=c_i$.

So we have $(a_i,b_i,c_i)=\{(1,1,1),(1,1,0),(1,0,0),(0,0,0)\}$. Since we want $T$ to be invertible we can just have $T=\begin{pmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 1 \end{pmatrix}$.

Notice an observation about this $T$, it is just a "staircase" matrix, where the bottom right triangle is filled with $1$ and the rest of the elements are $0$.

Now what happens if we do $T \cdot \begin{pmatrix}a \\ b \\ c\end{pmatrix} = \begin{pmatrix}a \\ a+b \\ a+b+c \end{pmatrix}$. That is, $T$ is literally a "do prefix sum matrix".

So what happens if we consider numbers with $n$ bits, what would multiplying the matrix by $\underbrace{T \otimes T \otimes \ldots \otimes T}_{n \text{ times}}$ do? (Yes, we are considering the case for operator $|$).

Let us try to expand out $T \otimes T \otimes T$.

$T \otimes T \otimes T= \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix} \otimes \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix} \otimes \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix}= \begin{pmatrix} 1 & 0 & 0 & 0 \\ 1 & 1& 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 1& 1 & 1 \end{pmatrix} \otimes \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix}=\begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{pmatrix}$.

This may look like garbage but we will try to compute $(T \otimes T \otimes T) \cdot \begin{pmatrix} 000 \\ 001 \\ 010 \\ 011 \\ 100 \\ 101 \\ 110 \\ 111 \end{pmatrix}=\begin{pmatrix} 000 \\ 000+001 \\ 000+010 \\ 000+001+010+011 \\ 000+100 \\ 000+001+100+101 \\ 000+010+100+110 \\ 000+001+010+011+100+101+110+111 \end{pmatrix}$.

Notice that the effect of $T \otimes T \otimes T$ is exactly SOS. That means, FWHT for $\&$ and $|$ is just applying the zeta transform and the mobius transform in $O(n \cdot 2^n)$ using SOS dp $^{,}$, which is the same complexity as the $O(N \cdot \log N)$ we have using FWHT taking $N=2^n$.

So we can generalize FWHT for the $b$-bit version of $\&$ and $|$ using (generalized) SOS dp in $O(n \cdot b^n)$ where the array sizes of the convolution are at most size $b^n$.

The reason for why FWHT for $\&$ and $|$ can be done using subset sum is actually quite intuitive in hindset, so I am quite confused how I have never made the connection before.

We can think of this as some PIE, by focusing on the case of $|$ on binary numbers (for simplicity). Define $\bar A_{mask}=\sum\limits_{sub \subseteq mask} A_{sub}$ and $\bar B_{mask}=\sum\limits_{sub \subseteq mask} B_{sub}$.

Now, by PIE, we know that $C_{mask}=\sum\limits_{sub \subseteq mask} (-1)^{|mask \setminus sub|} \bar A_{sub} \bar B_{sub}$. It is clear that $C=\mu(\bar A \cdot \bar B)=\mu(\zeta(A)\cdot \zeta(B))$.

## Subset Sum Convolution

Now, we can actually use ideas from FWHT to do subset sum convolution, where we want to find $C(mask)=\sum\limits_{sub \subseteq mask} A_{sub} \cdot B_{mask \setminus sub}$ for all $mask$.

Define:

• $\hat A_{i,mask}\begin{cases}A_{mask} &, \text{if } |mask|=i\\ 0 &, \text{otherwise}\end{cases}$
• $\hat B_{i,mask}\begin{cases}B_{mask} &, \text{if } |mask|=i\\ 0 &, \text{otherwise}\end{cases}$

It can be shown that $C(mask)=\mu \Bigg(\sum\limits_{i=0}^{|mask|} \zeta(\hat A_{i}) \cdot \zeta(\hat B_{|mask|-i}) \Bigg)$ and it can be calculated in $O(n^2 \cdot 2^n)$ $^{}$.

Well, we can reason about it very easily now. Since for convolution with $|$, FWHT is same as $\zeta$ and the inverse FWHT is same as $\mu$. So it is obvious why the above equation holds, the only way that some non-zero element in $\hat{A}_{i}$ and $\hat B_{|mask|-i}$ can convolute to an element with $|mask|$ bits is for those element 's bits to be mutually exclusive.

Also note that we can apply the inverse FWHT on the outside of the summation as $\mu(A)+\mu(B)=\mu(A+B)$ since $\mu$ can be thought of as some arbitrary matrix.

Actually, we don't have to restrict ourselves to convolution with $|$ for this task. Convolution with $\oplus$ and $\&$ works fine too.

## More Trash

We can do more weird stuff like performing the operation $\oplus$ on the $0$-th bit, performing $\&$ on the $1$-th bit and performing $|$ on the $2$-th bit.

Recalling that the matrices for these are $T_\oplus=\Big( \matrix{1 & 1 \\ 1 & -1} \Big)$, $T_\&=\Big( \matrix{0 & 1 \\ 1 & 1} \Big)$ and $T_|=\Big( \matrix{1 & 0 \\ 1 & 1} \Big)$ respectively. We just multiply our vector by $T_| \otimes T_\& \otimes T_\oplus$ for the forward transform and do the inverse for the backward transform.

We can also "mix and match" transformations of various sizes as long as we modify the result given in the Kronecker Product section.

As an example, suppose we have a number system of $(a,b)$ where $0 \leq a < 2$ and $0 \leq b < 3$ and we define $(a,b)+(c,d)=(a+c \mod 2, b+d \mod 3)$.

We can do convolution on this by noting that we have $T_2=\Big( \matrix{\omega_2^0 & \omega_2^0 \\ \omega_2^0 & \omega_2^1} \Big)$ and $T_3=\begin{pmatrix} \omega_3^0 & \omega_3^0 & \omega_3^0 \\ \omega_3^0 & \omega_3^1 & \omega_3^2 \\ \omega_3^0 & \omega_3^2 & \omega_3^4\end{pmatrix}$. So the transformation is given by $T_2 \otimes T_3=(T_2 \otimes I_3) (I_2 \otimes T_3)$.

So as long as you can define the transformation matrix, you can combine them and do weird stuff.

## References fft, fwht,
By errorgorn, history, 11 months ago,  Spoiler

By errorgorn, history, 12 months ago, ## 1526A - Mean Inequality

Setter: antontrygubO_o
Preparer: errorgorn

Hint 1
Hint 2
Solution
Code (C++)
Code (Python)

## 1526B - I Hate 1111

Setter: errorgorn
Preparer: errorgorn

Hint 0
Hint 1
Hint 2
Solution
Code (C++)
Code (Python)

## 1526C1 - Potions (Easy Version) and 1526C2 - Potions (Hard Version)

Setter: errorgorn and oolimry
Preparer: oolimry

Hint 1
Hint 2
Hint 3
Solution
Code (C++)
Code 2 (C++)
Code (Python)

## 1526D - Kill Anton

Setter: errorgorn
Preparer: errorgorn

Hint 1
Solution
Code (C++)
Code (Python)

## 1526E - Oolimry and Suffix Array

Setter: iLoveIOI
Preparer: iLoveIOI

Hint 1
Solution
Code (C++)
Code (Python)

## 1526F - Median Queries

Setter: errorgorn and oolimry
Preparer: errorgorn

Hint 1
Hint 2
Solution
Code (C++)
Code (Python)

## Code Golf

You may have realized that python codes in the editorial are quite short. We actually had a code golf challenge among testers. You are invited to try code golf our problems and put them in the comments. Maybe I will put the shortest codes in the editorial ;)

Rules for code golf:

• any language allowed
• codes will be judged by number of characters
• must get AC in the respective problems Tutorial of Codeforces Round #723 (Div. 2)
By errorgorn, history, 12 months ago, Hello again Codeforces!

errorgorn, oolimry and iLoveIOI are glad to invite you to participate in Codeforces Round #723 (Div. 2) which will be held at May/28/2021 17:05 (Moscow time). The round will be rated for the participants with a rating lower than 2100. Participants from the first division are also welcomed to take part in the competition but it will be unrated for them.

You will be given 2 hours and 30 minutes to solve 6 questions. One of the puzzles is interactive, please read the guide of interactive problems before the contest.

We would like to thank:

We hope you find the bugaboos interesting and fun! Wish you high rating!

Here is scoring distribution because you guys deserve it <3

• 500 — 1000 — (750+1000) — 2250 — 2500 — 3500

Btw, remember to downvote all testers who writes stupid comments to beg for likes. Announcement of Codeforces Round #723 (Div. 2)
By errorgorn, 15 months ago, ## 1491A - K-th Largest Value

Setter: syksykCCC
Prepared by: syksykCCC

Hint 1
Hint 2
Solution
Code (C++)
Code (Python) (vim1729)

## 1491B - Minimal Cost

Setter: syksykCCC
Prepared by: syksykCCC

Hint 1
Hint 2
Solution
Code (C++)

## 1491C - Pekora and Trampoline

Setter: oolimry
Prepared by: errorgorn

Hint 1
Hint 2
Solution
Code (C++)
Code (Python)

## 1491D - Zookeeper and The Infinite Zoo

Setter: errorgorn
Prepared by: errorgorn

Hint 1
Hint 2
Hint 3
Hint 4
Solution
Code (C++)
Code (C++) (gamegame)
Code (Python)

## 1491E - Fib-tree

Setter: Widowmaker
Prepared by: Widowmaker and syksykCCC

Hint
Solution
Code (C++)

## 1491F - Magnets

Setter: 3.141592653 and star_xingchen_c
Prepared by: 3.141592653 and star_xingchen_c

Hint 1
Hint 2
Hint 3
Solution
Code (C++)

## 1491G - Switch and Flip

Setter: errorgorn and oolimry
Prepared by: errorgorn

Hint 1
Hint 2
Hint 3
Solution
Code (C++)
Code (Python)

## 1491H - Yuezheng Ling and Dynamic Tree

Setter: Ynoi
Prepared by: Widowmaker and Ynoi

Hint 1
Solution
Code (C++)

## 1491I - Ruler Of The Zoo

Setter: oolimry
Prepared by: oolimry

As the solution to this problem is very long, the full editorial is split into $4$ parts. If you want to challenge yourself, you can try reading one part at a time and see if you get any inspiration. You can also try to read the specific hints for each part.

Part 1 Hint 1
Part 1
Part 2 Hint 1
Part 2 Hint 2
Part 2 Hint 3
Part 2
Part 3 Hint 1
Part 3 Hint 2
Part 3 Hint 3
Part 3 Hint 4
Part 3 Hint 5
Part 3 Hint 6
Part 3
Part 4
Code (C++)
Code (C++) (errorgorn) Tutorial of Codeforces Global Round 13 By errorgorn, 19 months ago, Hello, Codeforces!

Welcome to the Codeforces Raif Round 1 (Div. 1 + Div. 2) supported by Raiffeisenbank, that will start on Oct/17/2020 16:05 (Moscow time). It will be a combined rated round for both divisions. Note that the start time is unusual.

All problems were authored and prepared by bensonlzl, oolimry, errorgorn, dvdg6566, shenxy13.

Sexpert gato to:

You will be given 8 problems, one of which would be divided into easy and hard versions, and 150 minutes to solve them.

We hope that statements are short and pretests are strong and that you find the problems interesting! Good luck, have fun and we wish everyone high ratings!

The scoring distribution will be announced closer to the beginning of the round.

Thanks to Raiffeisenbank, winners will get awesome swag:

• 1st-3rd place = Bluetooth speaker

• 4th-10th place = Bumbag

• 11th-20th place = Power Bank Random 50 participants outside of top-20, who solved at least one problem will receive:

• Thermos Mugs

• Raiffeisenbank t-shirt

We develop a high-frequency trading (HFT) system for equity, currency and derivative markets. Our business edge is in technology. The main goal is to create a top-notch platform based on fundamental and statistical models and machine learning, with low latency and high throughput. The efficiency and scalability of our code give us a competitive advantage. We are passionate about code quality and strive for highest standards in product development.

If you are interested in internship and employment opportunities in the Raiffeisenbank algo-trading team Capital Markets, or want to get in touch with the bank's recruitment , fill out a form below.

FILL OUT FORM →

UPD: Scoring distribution: 500 — 1000 — 1000 — 1500 — 1750 — 1750 — (2250+750) — 4000

UPD2: Editorial out!

UPD 3: First ACs and winners

First ACs

A: 300iq

B: icecuber

E: Errichto

F: fmota

Top 5

Congratulations to everyone! Announcement of Codeforces Raif Round 1 (Div. 1 + Div. 2)
By errorgorn, 23 months ago, So my O level Chinese exam is in 2 days so I decided to learn a data structure that I can only find resources for in Chinese. I thought I might as well write a tutorial in English.

This data structure is called 析合树, directly translated is cut join tree, but I think permutation tree is a better name. Honestly, after learning about it, it seems like a very niche data structure with very limited uses, but anyways here is the tutorial on it.

Thanks to dantoh and oolimry for helping me proofread.

### Motivation

Consider this problem. We are given a permutation,$P$ of length $n$. A good range is a contiguous subsequence such that $\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i = r-l$. This can be thought of the number of contiguous subsequence such that when we sort the numbers in this subsequence, we get contiguous values. Count the number of good ranges.

Example: $P=\{5,3,4,1,2\}$.

All good ranges are $[1,1], [2,2], [3,3], [4,4], [5,5], [2,3], [4,5], [1,3], [2,5], [1,5]$.

The $O(n^2)$ solution for this is using sparse table and checking every subsequence if it fits the given conditions. But it turns out we can speed this up using permutation tree to $O(n\log{n})$.

### Definitions

A permutation $P$ of length $n$ is defined as:

• $|P|=n$
• $\forall i, P_i \in [1,n]$
• $\nexists i,j \in [1,n], P_i \ne P_j$

A good range is defined as a range, $[l,r]$ such that $\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i = r-l$ or equivalently $\nexists x,z \in [l,r], y \notin [l,r], P_x<P_y<P_z$.

We denote a good range $[l,r]$ of $P$ as $(P, [l,r])$, and also denote the set of all good ranges as $I_g$.

### Permutation Tree

So we want a structure that can store all good ranges efficiently.

Firstly, we can notice something about these good ranges. They are composed by the concatenation of other good ranges.

So the structure of the tree is that a node can have some children and the range of the parent is made up of the concatenation of the children's ranges.

Here is an example permutation. $P=\{9,1,10,3,2,5,7,6,8,4\}$. As we can see from the above image, every node represents a certain good range, where the values in the node represent the minimum and maximum values contains in this range.

Notice in this data structure, for any 2 nodes $[l_1,r_1]$ and $[l_2,r_2]$, WLOG $l_1 \leq l_2$, either $r_1<l_2$ or $r_2 \leq r_1$.

### Definition of Cut Nodes and Join Nodes

We shall define some terms used in this data structure:

• Node range: For some node $u$, $[u_l,u_r]$ will describe the minimum and maximum value contained in the range the node represents
• Ranges of children: For some non-leaf node $u$, let the array $S_u$ denote the array of the ranges of its children. For example, the root node the above picture, $S_u$ is $\{[9,9],[1,1],[10,10],[2,8]\}$.
• Order of children: For some non-leaf node $u$, we can discretize the ranges in $S_u$. Again using the example of the root node, the order of its children is $\{3,1,4,2\}$, we will call this $D_u$.
• Join node: For some non-leaf node $u$, we call it a join node if $D_u=\{1,2,3,\cdots\}$ or $D_u=\{\cdots,3,2,1\}$. For simplicity we also consider all leaf nodes to be join nodes.
• Cut node: Any node that is not a join node.

### Properties of Cut Nodes and Join Nodes

Firstly, we have this very trivial property. The union of all ranges of children is the node's range. Or in fancy math notation, $\bigcup_{i=1}^{|S_u|} S_u[i]=[u_l,u_r]$.

For a join node $u$, any contiguous subsequence of ranges of its children is a good range. Or, $\forall i,j,1 \leq i \leq j \leq |S_u|, \bigcup_{i=l}^{r} S_u[i]\in I_g$.

For a cut node $u$, the opposite is true. Any contiguous subsequence of ranges of its children larger than 1 is not a good range. Or, $\forall i,j,1 \leq i < j \leq |S_u|, \bigcup_{i=l}^{r} S_u[i]\notin I_g$.

The property of join nodes is not too hard to show by looking at the definition of what a join node is.

But the property of cut nodes is slightly harder to prove. A way to think about this is that for some cut node such that there is a subsequence of ranges bigger than 1 that form a good range, then that subsequence would have formed a range. This is a contradiction.

### Construction of Permutation Tree

Now we will discuss a method to create the Permutation Tree in $O(n\log{n})$. According to a comment by CommonAnts, the creator of this data structure, a $O(n)$ algorithm exists, but I could not find any resources on it.

#### Brief overview of algorithm

We process the permutation from left to right. We will also keep a stack of cut and join nodes that we have processed previously. Now let us consider adding $P_i$ to this stack. We firstly make a new node $[P_i,P_i]$ and call it the node we are currently processing.

1. Check if we can add the currently processed as a child of the node on top of the stack.
2. If we cannot, check if we can make a new parent node (this can either be a cut or join node) such that it contains some suffix of the stack and the current processed node as children.
3. Repeat this process until we cannot do any more operations of type 1 or 2.
4. Finally, push the currently processed node to the stack.

Notice that after processing all nodes, we will only have 1 node left on the stack, which is the root node.

#### Details of the algorithm

For operation 1, if we note that we can only do this if the node on top of the stack is a join node. Because if we can add this as a child to a cut node, then it contradicts the fact that no contiguous subsequence of ranges of children larger than 1 of a cut node can be a good range.

For operation 2, we need a fast way to find if there exists a good range such that we can make a new node from. There are 3 cases:

• We cannot make a new node.
• We can make a new join node. This new node has 2 children.
• We can make a new cut node.

#### Checking if there exists a good range

We have established for a good range $(P,[l,r])$ that $\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i = r-l$.

Since $P$ is a permutation, we also have $\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i \geq r-l$ for all ranges $[l,r]$.

Equivalently, we have $\max\limits_{l \leq i \leq r} P_i - \min\limits_{l \leq i \leq r} P_i - (r-l) \geq 0$, where we have equality only for good ranges.

Say that we are currently processing $P_i$. We define a value $Q$ for each range $[j,i], Q_j=\max\limits_{j \leq k \leq i} P_k - \min\limits_{j \leq k \leq i} P_k - (i-j),0< j \leq i$. Now we just need to check if there is some $Q_j=0$, where $j$ is not in the current node being processed.

Now we only need to know how to maintain this values of $Q_j$ quickly when we transition from $P_i$ to $P_{i+1}$. We can do this by updating the max and min values every time it changes. How can we do this?

Let's focus on updating the max values since updating the min values are similar. Let's consider when the max value will change. It changes every time $P_{i+1} > \max$. Let us maintain a stack of the values of $\max\limits_{j \leq k \leq i}P_k$, where we will store distinct values only. It can be seen that this stack is monotonically decreasing. When we add a new element to this stack, we will pop all elements in the stack which are smaller than it and update their maximum values using a segment tree range add update. This amortizes to $O(n)$ as each value is pushed into the stack once.

Do note to decrement all $Q_j$ by 1 since we are incrementing $i$ by 1.

Now that we can maintain all values of $Q_j$, we can simply check the minimum value of the range we are interested in using segment tree range minimum queries.

If we can make a new cut node, then we greedily try to make new cut node. We can do this by adding another node from our stack until our new cut node is valid.

Since the above may be confusing, here is a illustration of how the construction looks like. ### Problems using Permutation Tree

Codeforces 526F – Pudding Monsters

Idea
Code

CERC 17 Problem I – Instrinsic Interval

Idea
Code

Codeforces 997E – Good Subsegments

Codeforces 1205F – Beauty of a Permutation

CodeChef – Army of Me

CodeChef – Good Subsequences By errorgorn, history, 2 years ago, Sumitomo Mitsui Trust Bank Programming Contest 2019 has just finished, and this is an unofficial editorial.

Thanks to my friends oolimry and shenxy13 for helping write some of the editorial.

# A — November 30

You can solve it simply by checking for each end date of the Gregorian calendar. However, note that as the second date directly follows the first date (a fact which I think is not clear in the English translation), we can also check whether they're in different months, or whether the second date is the first day of a month. This can be done in constant time.

Code

# B — Tax Rate

We note that there is monotonicty in this question. $\lfloor 1.08(X+1) \rfloor \geqslant \lfloor 1.08X \rfloor$. Hence, we can binary search the answer. When we binary search the value of X(the answer), if $\lfloor 1.08X \rfloor = N$, then we have our answer. Otherwise, we can search higher if $\lfloor 1.08X \rfloor > N$ and search lower otherwise. If we find that no number gives us desired N, then it is impossible.

Code

# C — 100 to 105

We can simply do a 0-infinity knapsack with weights 100,101,...,105 and check if some value is reachable. We get a time complexity of $O(N)$.

Code

# D — Lucky PIN

First, we note that there are $O(N^{3})$ subsequences of the string, so generating all of them and using a set to check for number of distinct subsequences is TLE. However, there are only at most 1000 distinct such subsequences, from 000 to 999. We can linearly scan through the string for each of these possible subsequences to check if it is actually a subsequence of the string in $O(N)$. Thus, this can be solved in $O(1000N)$, which is AC.

Code

# E — Colorful Hats 2

Firstly, we can imagine there are 3 imaginary people standing at the very front, each with a different colour hat. For each person, we consider how many possible people could be the first person in front of him with the same hat colour. If the current person has number X, then the number of ways is:

(no. of ppl with X-1 in front) — (no. of ppl with X in front)

This is because those with X in front of him must be paired with one of the X-1 already, so this reduces our options.

Implementation wise, we can store an array which keeps track of the number of people with no. X who are not paired yet. Initially, all values are 0, except at index -1 with the value of 3. Then when processing the current p[user:AtillaAk]erson X, we multiply the answer by arr[X-1], decrease arr[X-1] by 1, and increase arr[X] by 1.

Code

# F — Interval Running

Firstly, if $T_{1}A_{1}+T_{2}A_{2}=T_{1}B_{1}+T_{2}B_{2}$, the answer is infinity.

Else, WLOG, we let $T_{1}A_{1}+T_{2}A_{2} > T_{1}B_{1}+T_{2}B_{2}$. If $A_{1} > B_{1}$, then Takahashi and Aoki will never meet each other. The answer is 0. Now, we have solved all the corner cases. We shall move on to the general case. We shall call the period $T_{1}+T_{2}$. Now, we shall find the number of periods where Takahashi and Aoki meet each other. If we do some math, we get the number of periods to be $\lceil \frac{T_{1}(B_{1}-A_{1})}{(T_{1}A_{1}+T_{2}A_{2})-(T_{1}B_{1}+T_{2}B_{2})} \rceil$.

The number of times that Takahashi and Aoki meet each other is $2periods-1$ since every period they meet each other twice when Aoki overtakes Takahashi and Takahashi overtakes Aoki. We need to minus 1 since we do not count the first time they meet each other at the very start. We submit this and... WA.

Yes, we need to think about the case where $\frac{T_{1}(B_{1}-A_{1})}{(T_{1}A_{1}+T_{2}A_{2})-(T_{1}B_{1}+T_{2}B_{2})}$ is a whole number. In this case, we need to add one, as there will be one more time where Aoki will meet up with Takahashi but never overtake him. And now we get AC. (Thanks to Atill83 for pointing out the mistake)

Code By errorgorn, history, 3 years ago, Example problem: (I dont have the link as this problem only exists in my school's private judge)

We have an array of n positive intergers and we need to group them into k segments such that each element of the array is in a segment and that all elements in a segment are contigous. The cost of a segment is the sum of the numbers in the segment squared.

We need to minimize the total cost of all the segments.

Example 1, we have n=5 and k=3. Where the array is [1,2,3,4,5]. The optimal answer is by grouping (1+2+3) , (4) and (5) into different segments. The total cost is 6^2 + 4^2 + 5^2 =77.

Example 2, we have n=4, k=2. Where the array is [1,1000000,3,4]. The optimal answer is by grouping (1,1000000) and (3,4) into different segments. The total cost is 1000001^2+7^2=10000200050

Now I asked some people and they said this can be solved by doing the lagrange (or aliens) speedup trick. We shall define a cost C. Then do a convex hull trick where we try to minimize the value of all our segments but we also add C to our cost whenever we make a new segment. Now, if C is INF, then we will only have 1 segment, while if C is 0 we will have n segments. So people claim that we can binary search on C to find some C where there will be K segments existing.

This allowed me to AC the problem, but my solution was not always correct as the test cases were weak. I thought of this test case: n=4, k=3. Where is array is [1,1,1,1] Now, when C is 2, we get 4 segments. But when C is 3, we get 2 segments only. Therefore when I ran my code against this case my code printed 8, even though the answer was (1+1)^2+1^2+1^2=6.

So I think I need someway to iterpolate the lagrange speedup trick. Anyone can help? My code is here.

For my code, the input format is:

n k

a1 a2 a3 a3... an

Where ai is the ith element in the array. 