### ko_osaga's blog

By ko_osaga, history, 2 weeks ago,

## Chapter 4. Fast algorithm for $\boxdot$ operator

We stopped at the point where we learned how to:

• Implement the $\boxdot$ operator in $O(N^3)$ time
• Use the $\boxdot$ operator for $O(N^2)$ time

I actually didn't introduce the name to avoid unnecessary scare, but the original paper calls this operator as unit-Monge matrix-matrix distance multiplication. Throughout the article, we will call it as the unit-Monge multiplication (of permutation) or just $\boxdot$ operator as we did before.

Let's see how to compute the $\boxdot$ operator in $O(N \log N)$ time. For a matrix $\Sigma(A), \Sigma(B)$ consider the partitioning $\Sigma(A) = [\Sigma(A)_{lo}, \Sigma(A)_{hi}], \Sigma(B) = \begin{bmatrix} \Sigma(B)_{lo} \newline \Sigma(B)_{hi} \end{bmatrix}$, where $lo$ denotes the first $N/2 + 1$ entries, and $hi$ denotes last $N/2$ entries. We assume $N$ to be even for a simpler description.

$\Sigma(A) \odot \Sigma(B)$ is a element-wise minimum of $\Sigma(A)_{lo} \odot \Sigma(B)_{lo}$ and $\Sigma(A)_{hi} \odot \Sigma(B)_{hi}$. Also, $\Sigma(\{A, B\})_{lo, hi}$ will roughly correspond to $\Sigma(\{A, B\}_{lo, hi})$ where

• $A_{lo}$ is the subpermutation of $A$ composed of elements in value range $[1, N/2]$
• $A_{hi}$ is the subpermutation of $A$ composed of elements in value range $[N/2+1, N]$
• $B_{lo}$ is the subpermutation of $B$ composed of elements in index range $[1, N/2]$
• $B_{hi}$ is the subpermutation of $B$ composed of elements in index range $[N/2+1, N]$

We will compute $A_{lo} \boxdot B_{lo}$, $A_{hi} \boxdot B_{hi}$ recursively, and use the result to compute $C = A \boxdot B$.

Let

• $M_{lo}(i, k) = \min_{j = 1}^{N/2 + 1} (\Sigma(A)(i, j) + \Sigma(B)(j, k))$
• $M_{hi}(i, k) = \min_{j = N/2+2}^{N + 1} (\Sigma(A)(i, j) + \Sigma(B)(j, k))$ Then as we've just observed, $\Sigma(C)(i, k) = \min(M_{lo}(i, k), M_{hi}(i, k))$.

We want to express $M_{lo}$ and $M_{hi}$ as $A_{lo} \boxdot B_{lo}$ and $A_{hi} \boxdot B_{hi}$, but they are not the same — in the end $M_{lo}$ is an $(N+1) \times (N+1)$ matrix while $A_{lo} \boxdot B_{lo}$ is an $(N/2+1) \times (N/2+1)$ matrix.

### Representing $M_{lo}, M_{hi}$ as $C_{lo} = A_{lo} \boxdot B_{lo}$ and $C_{hi} = A_{hi} \boxdot B_{hi}$

We will assume

• $\Sigma(A_{lo})$ to be an $(N+1) \times (N/2+1)$ matrix defined in row/column index $[1, N+1] \times [1, N/2+1]$
• $\Sigma(A_{hi})$ to be an $(N+1) \times (N/2+1)$ matrix defined in row/column index $[1, N+1] \times [N/2+1, N+1]$
• $\Sigma(B_{lo})$ to be an $(N/2+1) \times (N+1)$ matrix defined in row/column index $[1, N/2+1] \times [1, N+1]$
• $\Sigma(B_{hi})$ to be an $(N/2+1) \times (N+1)$ matrix defined in row/column index $[N/2+1, N+1] \times [1, N+1]$

Note that $N/2+1$ rows are extended to $N + 1$ rows by copying values from the downward rows, and ditto for columns.

What we have is:

• $\Sigma(A)_{lo}(i, j) = \Sigma(A_{lo})(i, j)$
• $\Sigma(A)_{hi}(i, j) = \Sigma(A_{hi})(i, j) + \Sigma(A_{lo})(i, N/2 + 1)$
• $\Sigma(B)_{hi}(i, j) = \Sigma(B_{hi})(i, j)$
• $\Sigma(B)_{lo}(i, j) = \Sigma(B_{lo})(i, j) + \Sigma(B_{hi})(N/2+1, j)$

Good, let's write down:

• $M_{lo}(i, k) = \min_{j = 1}^{N/2 + 1} (\Sigma(A_{lo})(i, j) + \Sigma(B_{lo})(j, k) + \Sigma(B_{hi})(N/2+1, k))$
• $M_{hi}(i, k) = \min_{j = 1}^{N/2 + 1} (\Sigma(A_{hi})(i, j) + \Sigma(A_{lo})(i, N/2+1) + \Sigma(B_{hi})(j, k))$

We know $C_{lo} = A_{lo} \boxdot B_{lo}$ and $C_{hi} = A_{hi} \boxdot B_{hi}$, why don't we use it?

• $M_{lo}(i, k) = \Sigma(C_{lo})(i, k) + \Sigma(B_{hi})(N/2+1, k)$
• $M_{hi}(i, k) =\Sigma(C_{hi})(i, k) + \Sigma(A_{lo})(i, N/2+1)$ (Note that we consider $C_{lo}, C_{hi}$ as $(N+1) \times (N+1)$ matrix)

We again, consider the derivatives, and simplify:

• $M_{lo}(i, k) = \Sigma(C_{lo})(i, k) + \Sigma(C_{hi})(1, k)$
• $M_{hi}(i, k) =\Sigma(C_{hi})(i, k) + \Sigma(C_{lo})(i, N + 1)$

Good, now we represented $M_{lo}$ and $M_{hi}$ in terms of $C_{lo}$ and $C_{hi}$.

### Recovering $C$ from $C_{lo}, C_{hi}$

To evaluate $C$ where $\Sigma(C)(i, k) = \min(M_{lo}(i, k), M_{hi}(i, k))$, it will be helpful to characterize the position where $M_{lo}(i, k) - M_{hi}(i, k) \geq 0$. Let's denote this quantity as $\delta(i, k) = M_{lo}(i, k) - M_{hi}(i, k)$, we can see

• $\delta(i, k)= \Sigma(C_{lo})(i, k) - \Sigma(C_{lo})(i, N + 1) + \Sigma(C_{hi})(1, k) - \Sigma(C_{hi})(i, k)$
• $\delta(i, k) = |\{x | 1 \le x \le i - 1, 1 \le C_{hi}[x] \le k - 1\}| - |\{x | i \le x \le N, k \le C_{lo}[x] \le N\}|$

Observe that the function is nondecreasing for both $i, k$. More specifically, since all values $C_{hi} \cup C_{lo}$ are distinct, we have

• $0 \leq \delta(i, k+1) - \delta(i, k) \leq 1$
• $0 \leq \delta(i+1, k) - \delta(i, k) \leq 1$

If we characterize the cells where $\delta(i, k) < 0$ and $\delta(i, k) \geq 0$, the demarcation line will start from lower-left corner $(N+1, 1)$ to upper-right corner $(1, N+1)$. The difference of $\delta(i, k+1) - \delta(i, k)$ and $\delta(i-1, k) - \delta(i, k)$ can be computed in $O(1)$ time, so the demarcation line can be actually computed in $O(N)$ time with two pointers.

We want to find all points $(i, j)$ where $\Sigma(C)(i, j+1) - \Sigma(C)(i, j) - \Sigma(C)(i+1, j+1) + \Sigma(C)(i+1, j) = 1$. If such points in $C_{lo}$ and $C_{hi}$ is not near the demarcation line, we can simply use them. But if they are adjacent to the demarcation line, we may need some adjustment. Let's write it down and see what cases we actually have:

• Case 1. $\delta(i + 1, j +1) \le 0$: In this case, all corners use the value from $M_{lo}$ and hence the point in $C_{lo}$ is preserved: If there is a point $(i, j) \in C_{lo}$, we can use it.
• Case 2. $\delta(i, j) \geq 0$: In this case, all corners use the value from $M_{hi}$ and hence the point in $C_{hi}$ is preserved: If there is a point $(i, j) \in C_{hi}$, we can use it.
• Case 3. None of the above: For this to hold, we need $\delta(i, j) = -1, \delta(i, j+1) = \delta(i+1, j) = 0, \delta(i+1, j+1) = 1$. I'm omitting the proof, but you can show that $(i, j)$ is always included in $C$.

Note that the points in $C_{lo}$ and $C_{hi}$ are distinct per their $x$-coordinate and $y$-coordinate, therefore you can set $C$ as the union of $C_{lo}$, $C_{hi}$ and overwrite in the position where the Case 3 holds. This can be done by simply moving through the demarcation line, and checking the Case 3 condition whenever its necessary.

As a result, we obtain an $O(N)$ algorithm to obtain $A \boxdot B$ from $A_{lo} \boxdot B_{lo}$ and $A_{hi} \boxdot B_{hi}$, hence the total algorithm runs in $T(N) = O(N) + 2T(N/2)$ time.

I tried to implement the above algorithm and I think I got a pretty short and nice code. However, when I tried to obtain an actual seaweed matrix, I found that my code was about 5x slower than the fastest one (by noshi91) on the internet. The difference between my code and the fastest one seems to come from memory management — I declare lots of vectors in recursion, whereas the fastest one allocates $O(n)$ pool and use everything from there. I decided to not bother myself and just copy-paste it :) You can test your implementation in LibreOJ. 单位蒙日矩阵乘法. Here is my final submission.

## Chapter 5. Using the $\boxdot$ operator to obtain the seaweed matrix

Now we know how to implement the $\boxdot$ operator, and we know how to solve the Range LIS problem with $O(N^2)$ application of $\boxdot$, therefore we obtain an $O(N^3 \log N)$ algorithm. This is bad, but actually, it's pretty obvious to solve the Range LIS problem with $N$ application of $\boxdot$: Observe that, instead of creating a permutation for each entry, we can simply create a permutation for each row of seaweed matrix:

Therefore we have an $O(N^2 \log N)$ algorithm, but we still need more work. Hopefully, this isn't as complicated as our previous steps.

We will use divide and conquer. Consider the function $f(A)$ that returns the result of the seaweed matrix for a permutation $A$. Let $A_{lo}$ be a subpermutation consisted of numbers in $[1, N/2]$ and $A_{hi}$ as numbers in $[N/2+1, N]$. Our strategy is to compute the seaweed matrix $f(A_{lo})$ and $f(A_{hi})$ for each half of the permutation and combine it. We know how to combine the seaweed matrix with $\boxdot$, but the seaweed matrix from each subpermutation has missing columns.

Recall the rules of seaweed: If two seaweeds never met before, then they cross. From this rule, we can easily find the destination for missing columns: The seaweeds will just go downward. Therefore, the permutation for $A_{lo}$ and $A_{hi}$ can both be scaled to a larger one by filling the missing columns and missing rows (which are just identity). Then we can simply return the unit-Monge multiplication of them.

Extensions can be computed in $O(N)$ time and multiplication can be computed in $O(N \log N)$ time, hence $T(N) = 2T(N/2) + O(N \log N) = O(N \log^2 N)$. Hooray! we now know how to compute the Range LIS in $O(N \log^2 N)$. Here is my code which contains all of the contents above.

## Chapter 6. Using the seaweed to solve the problem

We obtained the seaweed matrix of a permutation in $O(N \log^2 N)$, so it is trivial to compute the range LIS of a permutation.

Problem: Range LIS. Given an permutation $A$ and $Q$ queries $(l, r)$, compute the LIS of $A[l], A[l + 1], \ldots, A[r]$.

Solution. Compute a seaweed matrix of $A$ in $O(N \log^2 N)$. As observed in Chapter 1, the length of LIS is the number of seaweeds with index at most $l + N - 1$ which arrived to $[l, r]$. The seaweeds that can arrive to $[l, r]$ has index at most $r + N$, so we can instead compute the seaweed from range $[l+N, r+N]$ that arrives to $[l, r]$, and subtract the quantity from $r - l + 1$.

In other words, the size of LIS equals to $r - l + 1$ minus the number of seaweeds that starts from the upper edge of dotted box and ends in the lower edge of dotted box. This is a 2D query, and can be computed with sweeping and Fenwick trees.

We can compute other nontrivial quantities as well.

Problem: Prefix-Suffix LIS. Given a permutation $A$ and $Q$ queries $(l, r)$, compute the LIS of $A[1], \ldots, A[l]$ where every elements have value at least $r$.

Solution. We want to compute the number of seaweeds that starts from the upper edge of box $[r, N] \times [1, l]$ and ends in the lower edge (such box is in the bottom-left position). Seaweeds that passes the upper edge of box will start in range $[N - r + 1, N + l]$. Therefore we also obtain a similar 2D query and it can also be solved with Fenwick trees. Note that same strategy works for Suffix-Prefix LIS as well (Prefix-Prefix or Suffix-Suffix are just trivial).

The second problem is interesting, since it can be used to solve a well-known problem in a more efficient way.

Problem: Maximum Clique in a circle graph. Given $n$ chords in a circle where each endpoints are distinct, compute the maximum size subset of chords, where each pair of chords intersect each other. Each endpoint of chords are labeled with distinct integers from $[1, 2n]$, where labels are in circular order.

Solution. We will denote "left endpoint" as a endpoint with smaller label, and "right endpoint" similarly. In an optimal solution, there exists some chord which its left endpoint has smallest label. Let $c = (s, e)$ be such a chord. If we fix such chord $c$, the remaining chords should cross $c$, and each intersecting chords should cross each other: For two chord $p = (l_1, r_1), q = (l_2, r_2)$, if $l_1 < l_2$ then $r_1 < r_2$. Let $A[x]$ be the opposite endpoint of chord incident to endpoint $x$. The above observation summarizes to the following: For all $x < A[x]$, compute the LIS of $A[x], A[x + 1], \ldots, A[A[x] - 1]$ where every elements have value at least $A[x]$. This is hard, but indeed it does not hurt to compute the LIS of $A[1], A[2], \ldots, A[A[x] - 1]$ where every elements have value at least $A[x]$: the LIS gives the valid clique anyway. Now the problem is exactly the prefix-suffix LIS and can be solved in $O(N \log^2 N)$ time, where the naive algorithm uses $O(N^2 \log N)$ time.

## Practice problems

• +134

By ko_osaga, history, 3 weeks ago,

Hello, Codeforces!

At some point of life you want to make a new data structure problem with short statement and genius solution. LIS (Longest Increasing Subsequence) is a classic problem with beautiful solution, so you come up with the following problem:

• Given a sequence $A$ of length $N$ and $Q$ queries $1 \le i \le j \le N$, compute the length of Longest Increasing Subsequence of $A[i], A[i + 1], \ldots, A[j]$.

But on the other hand this looks impossible to solve, and you just give up the idea.

I always thought that the above problem is unsolved (and might be impossible), but very recently I learned that such queries are solvable in only $O(N \log^2 N + Q \log N)$ time, not involving any sqrts! The original paper describes this technique as semi-local string comparison. The paper is incredibly long and uses tons of scary math terminology, but I think I found a relatively easier way to describe this technique, which I will show in this article.

Thanks to qwerasdfzxcl for helpful discussions, peltorator for giving me the motivation, and yosupo and noshi91 for preparing this problem.

## Chapter 1. The All-Pair LCS Algorithm

Our starting point is to consider the generalization of above problem. Consider the following problem:

• Given a sequence $S$ of length $N$, $T$ of length $M$, and $Q$ queries $1 \le i \le j \le M$, compute the length of Longest Common Subsequence of $S$ and $T[i], T[i + 1], \ldots, T[j]$.

Indeed, this is the generalization of the range LIS problem. By using coordinate compression on the pair $(A[i], -i)$, we can assume the sequence $A$ to be a permutation of length $N$. The LIS of the permutation $A$ is equivalent to the LCS of $A$ and sequence $[1, 2, \ldots, N]$. Hence, if we initialize with $S = [1, 2, \ldots, N], T = A$, we obtain a data structure for LIS query.

The All-Pair LCS problem can be a problem of independent interest. For example, it has already appeared in an old Petrozavodsk contest, and there is a various solution solving the problem in $O(N^2 + Q)$ time complexity (assuming $N =M$). Personally, I solved this problem by modifying the Cyclic LCS algorithm by Andy Nguyen. However, there is one particular solution which can be improved to a near-linear Range LIS solution, which is from the paper An all-substrings common subsequence algorithm.

Consider the DP table used in the standard solution of LCS problem. The states and transition form a directed acyclic graph (DAG), and have a shape of a grid graph. Explicitly, the graph consists of:

• $(N+1) \times (M+1)$ vertices corresponding to states $(i, j)$
• Edge of weight $0$ from $(i, j)$ to $(i+1, j)$ and $(i, j+1)$
• Edge of weight $1$ from $(i, j)$ to $(i+1, j+1)$ if $S[i+1] = T[j+1]$.

Figure: DAG constructed from the string "yxxyzyzx", "yxxyzxyzxyxzx"

Here, you can observe that the answer to the query $(i, j)$ corresponds to the longest path from $(0, i-1)$ to $(N, j)$. Let's denote the length of longest path from $(x_1, y_1)$ to $(x_2, y_2)$ as $dist((x_1, y_1), (x_2, y_2))$. Our goal is to compute $dist((0, i), (N, j))$ for all $0 \le i < j \le M$.

How can we do this? We need several lemmas:

Lemma 1. $dist((0, y), (i, j)) - dist((0, y), (i, j-1))$ is either $0$ or $1$.

Proof.

• $dist((0, y), (i, j-1)) \le dist((0, y), (i,j))$ since otherwise we can extend the path to $(i, j-1)$ with rightward edges.
• $dist((0, y), (i, j-1)) \geq dist((0, y), (i, j)) - 1$ since we can cut the path to $(i, j)$ exactly at the column $j - 1$ and move downward. $\blacksquare$

Lemma 2. $dist((0, y), (i, j)) - dist((0,y ), (i-1, j))$ is either $0$ or $1$.

Proof. Identical with Lemma 1. $\blacksquare$

Lemma 3. For every $i, j$, there exists some integer $0 \le i_h(i, j) \le j$ such that

• $dist((0, y), (i, j)) - dist((0, y), (i, j-1)) = 1$ for all $i_h(i, j) \le y < j$
• $dist((0, y), (i, j)) - dist((0, y), (i, j-1)) = 0$ for all $0 \le y < i_h(i, j)$

Proof. Above statement is equivalent of following: For all $y, i, j$ we have $dist((0, y), (i, j)) - dist((0, y), (i, j-1)) \le dist((0, y+1), (i, j)) - dist((0, y+1), (i, j-1))$. Consider two optimal paths from $(0, y) \rightarrow (i, j)$ and $(0, y+1) \rightarrow (i, j-1)$. Since the DAG is planar, two paths always intersect. By swapping the destination in the intersection, we obtain two paths $(0, y) \rightarrow (i, j-1)$ and $(0, y + 1) \rightarrow (i, j)$ which can not be better than optimal. Therefore we have $dist((0, y), (i, j)) + dist((0, y+1), (i, j-1)) \le dist((0, y+1), (i, j)) + dist((0, y), (i, j-1))$ which is exactly what we want to prove. $\blacksquare$

Lemma 4. For every $i, j$, there exists some integer $0 \le i_v(i, j) \le j$ such that

• $dist((0, y), (i, j)) - dist((0, y), (i-1, j)) = 0$ for all $i_v(i, j) \le y < j$
• $dist((0, y), (i, j)) - dist((0, y), (i-1, j)) = 1$ for all $0 \le y < i_v(i, j)$

Proof. Identical with Lemma 3. $\blacksquare$

Suppose we have the answer $i_h(i, j)$ and $i_v(i, j)$ for all $i, j$. How can we compute the value $dist((0, i), (N, j))$? Let's write it down:

$dist((0, i), (N, j)) \newline = dist((0, i), (N, i)) + \sum_{k = i+1}^{j} dist((0, i), (N, k)) - dist((0, i), (N, k-1)) \newline = 0 + \sum_{k = i+1}^{j} (i_h(N, k) <= i)$

It turns out that we don't even need all values, we only have to know a single linear array $i_h(N, *)$ ! Given that we have an array $i_h(N, *)$, the queries can be easily answered in $O(\log N)$ time with Fenwick trees, or $O(1)$ time if we use $N^2$ precomputation.

Hence, all we need to do is to compute the values $i_h$ and $i_v$, and it turns out there is a very simple recurrence.

Theorem 5. The following holds:

• $i_h(0, j) = j$
• $i_v(i, 0) = 0$
• For $i, j \geq 1$ and $S[i] = T[j]$
• $i_h(i, j) = i_v(i, j-1)$
• $i_v(i, j) = i_h(i-1, j)$
• For $i, j \geq 1$ and $S[i] \neq T[j]$
• $i_h(i, j) = \max(i_h(i-1, j), i_v(i, j-1))$
• $i_v(i, j) = \min(i_h(i-1, j), i_v(i, j-1))$

Proof. Base cases are trivial. For a fixed $y$, consider the distance from $(0, y)$ to the four cells in the rectangle $(i-1, j-1), (i-1, j), (i, j-1)$. Let $t = dist((0, y), (i-1, j-1))$, then the other two cells either attain value $t$ or $t + 1$. Therefore, the possibilities are:

• $dist((0, y), (i-1, j))$ having value $t$ or $t + 1$ (equivalently, $y \ge i_h(i - 1, j)$)
• $dist((0, y), (i, j-1))$ having value $t$ or $t + 1$ (equivalently, $y < i_v(i, j-1)$)
• Whether $S[i] = T[j]$ or not

Those three values uniquely determine $dist((0, y), (i, j))$. You can verify the Theorem 5 by manually inspecting all $2^3 = 8$ cases by hand. $\blacksquare$

Remark. At least this is the proof I found, and this is also the proof from the original paper. I believe there is a simpler interpretation, so please add a comment if you have a good idea!

As Theorem 5 gives a simple recurrence to compute all values $i_h$ and $i_v$, we can solve the All-Pair LCS problem in $O(NM + Q \log N)$ time, hence the Range LIS problem in $O(N^2 + Q \log N)$ time.

As long as SETH Conjecture is true, the longest common subsequence of two strings can not be computed faster than $O(NM)$ time. Hence our algorithm has no room for improvement. However, in the case of LIS, one of our pattern is fixed as $[1, 2, \ldots, N]$, and it turns out we can use this to improve the time complexity.

## Chapter 2. The Seaweed

Visualizing the above DP procedure gives a further insight on the structure. We can consider the value $i_v$ and $i_h$ to be associated with the edges of the grid: In that sense, the DP transition is about picking the values from the upper/left edges, and routing them to the lower/right edges of the rectangular cell. For example, we can draw a following picture:

In this picture, green curves represent the values — values from the left edges of big rectangle ("BAABCBCA") are $0$, from the upper edges of big rectangle ("BAABCABCABA") are $1, 2, \ldots, M$. We will call each green curve as a seaweed. We will also read the seaweed from the lower left corner to the upper right corner, and say the seaweed is in left or right according to this order. In this regard, in the beginning seaweeds are sorted in the increasing order.

Let's reinterpret the DP transition from Theorem 5 with this visualization. If $S[i] = T[j]$, two seaweed do not intersect. If $S[i] \neq T[j]$, two seaweed intersect if the right seaweed have a greater value than the left one. In other words, each cell $S[i] \neq T[j]$ is the anti-sorter of seaweed: If two adjacent seaweeds $i, i+1$ have increasing values ($A[i] < A[i +1]$), it swaps so that they have decreasing values ($A[i] > A[i+1]$).

Of course, in the case of Range LIS we have $N^2 - N$ such pair, so this is still not enough to solve the problem, but now I can present a main idea for optimization.

Suppose that we swap two values regardless of their values. We can represent each operation as a permutation $P$ where $P(i)$ stores the final position of $i$-th seaweed from the beginning. Let's say we have a swap operation in position $i_1, i_2, \ldots, i_k$, and let the elementary permutation $P_i$ be

$$$P_i(j)=\begin{cases} j+1, & \text{if}\ a=i \newline j-1, & \text{if}\ a=i+1 \newline j, & \text{otherwise}\end{cases}$$$

Then the total operation can be described as a single permutation $P = P_{i_1} \circ P_{i_{2}} \circ \ldots \circ P_{i_k}$ where $P \circ Q$ is a composite permutation: $P \circ Q(i) = Q(P(i))$.

We can't use this to solve the Range LIS problem because we take the values into account. But very surprisingly, even with that condition, there exists a cool operator $\boxdot$ such that:

• $\boxdot$ is associative.
• The total operation can be described as a single permutation $P = P_{i_1} \boxdot P_{i_{2}} \boxdot \ldots \boxdot P_{i_k}$

## Chapter 3. The Operator

The definition of this operator is pretty unintuitive, and needs several auxiliary lemmas:

Definition 6. Given a permutation $P$ of length $N$, let $\Sigma(P)$ be the $(N+1) \times (N+1)$ square matrix, such that $\Sigma(P)_{i, j} = |\{x|x \geq i, P[x] < j\}|$

Intuitively, it is a partial sum in left-down direction, for example, if $P = [2, 3, 1]$, we have:

$\Sigma(P) = \begin{bmatrix} 0&1&2&3 \newline 0&1&1&2 \newline 0&1&1&1 \newline 0&0&0&0 \end{bmatrix}$

Which is the partial sum of $\begin{bmatrix} 0&0&1&0 \newline 0&0&0&1 \newline 0&1&0&0\newline 0&0&0&0 \end{bmatrix}$.

Definition 7. Given two matrix $A$ of size $N \times M$, $B$ of size $M \times K$, the min-plus multiplication $A \odot B$ is $(A \odot B)_{i, j} = min_{1 \le k \le M} (A_{i, k} + B_{k, j})$.

Theorem 8. Given two permutation $P, Q$ of length $N$, there exists a permutation $R$ of length $N$ such that $\Sigma(R) = \Sigma(P) \odot \Sigma(Q)$. We denote such $R$ as $P \boxdot Q$.

To prove it we need two lemmas:

Lemma 8.1. For a matrix $\Sigma(R)$, there exists a permutation $R$ if and only if the following conditions are satisfied:

• $\Sigma(R)_{i, 1} = 0$
• $\Sigma(R)_{N+1, i} = 0$
• $\Sigma(R)_{i, N+1} = N + 1 - i$
• $\Sigma(R)_{1, i} = i - 1$
• $\Sigma(R)_{i, j} - \Sigma(R)_{i, j-1} - \Sigma(R)_{i+1, j} + \Sigma(R)_{i+1, j-1} \geq 0$

Proof of Lemma 8.1. Consider the inverse operation of partial sums. We can always restore the permutation if the "inverse partial sum" of each row and column contains exactly one $1$ for each rows and columns, and $0$ for all other entries. Fifth term guarantees that the elements are nonnegative, third and fourth term guarantees that each rows and columns sums to $1$. Those conditions are sufficient to guarantee that the inverse yields a permutation. $\blacksquare$

Lemma 8.2. For any matrix $A$, $A_{i, j} - A_{i, j-1} - A_{i+1, j} + A_{i+1, j-1} \geq 0$ for all $i, j$ if and only if $A_{i_1, j_2} - A_{i_1, j_1} - A_{i_2, j_2} + A_{i_2, j_1} \geq 0$ for all $i_1 \le i_2, j_1 \le j_2$.

Proof of Lemma 8.2. $\rightarrow$ is done by induction. $\leftarrow$ is trivial. $\blacksquare$

Proof of Theorem 8. We will prove the first four points of Lemma 9. Note that all entries of $\Sigma(R)$ are nonnegative since $\Sigma(P), \Sigma(Q)$ does.

• $\Sigma(R)_{i, 1} \le \Sigma(P)_{i, 1} + \Sigma(Q)_{1, 1} = 0$
• $\Sigma(R)_{N+1, i} \le \Sigma(P)_{N+1, N+1} + \Sigma(Q)_{N+1, i} = 0$
• $\Sigma(R)_{i, N + 1} = \min(\Sigma(P)_{i, j} + \Sigma(Q)_{j, N+1}) = \min(\Sigma(P)_{i, j} + N+1-j)$. Considering the derivative, the term is minimized when $j = N + 1$. $\Sigma(R)_{i, N+1} = \Sigma(P)_{i, N+1} = N+1-i$
• $\Sigma(R)_{1, i} = \min(\Sigma(P)_{1, j} + \Sigma(Q)_{j, i}) = \min(j-1 + \Sigma(Q)_{j, i})$. Considering the derivative, the term is minimized when $j = 1$. $\Sigma(R)_{1, i} = \Sigma(Q)_{1, i} = i-1$

Here, when we consider the derivative, we use the fact that $0 \le \Sigma(P)_{i, j} - \Sigma(P)_{i, j - 1} \le 1$. $N + 1 - j$ definitely decreases by $1$ when we increase the $j$, but $\Sigma(P)_{i, j}$ never increases more than $1$ even when we increase the $j$. Therefore, it does not hurt to increase the $j$. We will use this technique later on.

To prove the final point, let $k_1, k_2$ be the index where $\Sigma(R)_{i, j} = \Sigma(P)_{i, k_1} + \Sigma(Q)_{k_1, j}$, $\Sigma(R)_{i+1, j-1} = \Sigma(P)_{i+1, k_2} + \Sigma(Q)_{k_2, j-1}$. Suppose $k_1 \le k_2$, we have

$\Sigma(R)_{i, j-1} + \Sigma(R)_{i+1, j} \newline = \min_k (\Sigma(P)_{i, k} + \Sigma(Q)_{k, j-1}) + \min_k (\Sigma(P)_{i+1, k} + \Sigma(Q)_{k, j}) \newline \le \Sigma(P)_{i, k_1} + \Sigma(P)_{i+1, k_2} + \Sigma(Q)_{k_1, j-1} + \Sigma(Q)_{k_2, j} \newline \le \Sigma(P)_{i, k_1} + \Sigma(P)_{i+1, k_2} + \Sigma(Q)_{k_1, j} + \Sigma(Q)_{k_2, j-1} \newline =\Sigma(R)_{i, j} + \Sigma(R)_{i+1, j-1}$

(Note that Lemma 8.2 is used in $\Sigma(Q)$)

In the case of $k_1 \geq k_2$ we proceed identically, this time using the Lemma 8.2 for $\Sigma(P)$. $\blacksquare$

Theorem 9. The operator $\boxdot$ is associative.

Proof. Min-plus matrix multiplication is associative just like normal matrix multiplication. $\blacksquare$

Lemma 10. Let $I$ be the identity permutation ($I(i) = i$), we have $P \boxdot I = P$ (For proof you can consider the derivative.) $\blacksquare$

And now here comes the final theorem which shows the equivalence of the "Seaweed" and the "Operator":

Theorem 11. Consider the sequence of $N$ seaweed and sequence of operation $i_1, i_2, \ldots, i_k$, where each operation denotes the following:

• In the beginning, there is $i$-th seaweed in $i$-th position.
• For each $1 \le x \le k$, we swap the seaweed in $i_x$ th position and $i_x + 1$ th position, only if the seaweed $i_x$ has smaller index than seaweed $i_x+1$.

Let $P_i$ be the elementary permutation as defined above. Let $P = P_{i_1} \boxdot P_{i_{2}} \boxdot \ldots \boxdot P_{i_k}$ . Then after the end of all operation, $i$-th seaweed is in the $P(i)$-th position.

Proof of Theorem 11. We will use induction over $k$. By induction hypothesis $P_{i_1} \boxdot P_{i_{2}} \boxdot \ldots \boxdot P_{i_{k-1}}$ correctly denotes the position of seaweeds after $k - 1$ operations. Let

• $t = i_k$
• $A = P_{i_1} \boxdot P_{i_{2}} \boxdot \ldots \boxdot P_{i_{k-1}}$
• $B = P_{i_1} \boxdot P_{i_{2}} \boxdot \ldots \boxdot P_{i_{k}}$
• $A(k_0) = t, A(k_1) = t+1$

It suffices to prove that

• $B(k_0) = t+1, B(k_1) = t, B(i) = A(i)$ for all other $i$ if $k_0 < k_1$
• $B= A$ if $k_0 > k_1$

Which is also equivalent to:

• $\Sigma(B)_{i, j} = \Sigma(A)_{i, j} + 1$ if $k_0 < i \le k_1, j = t + 1$
• $\Sigma(B)_{i, j} = \Sigma(A)_{i, j}$ otherwise.

Observe that $\Sigma(P_t) - \Sigma(I)$ has only one nonzero entry $(\Sigma(P_t) - \Sigma(I))_{t+1, t+1} = 1$. Since we know $\Sigma(A) \odot \Sigma(I) = \Sigma(A)$, $\Sigma(B)$ and $\Sigma(A)$ only differs in the $t+1$-th column. For the $t + 1$-th column, note that

$\Sigma(B)_{i, t + 1}$ $= \min_j (\Sigma(A)_{i, j} + \Sigma(P_t)_{j, t + 1})$ $= \min(\min_{j \le t} (\Sigma(A)_{i, j} + t + 1 - j), \Sigma(A)_{i, t + 1} + 1, (\min_{j > t+1} \Sigma(A)_{i, j})$ $= \min(\Sigma(A)_{i, t} + 1,\Sigma(A)_{i, t + 2})$ (derivative)

If $k_0 < k_1$, we have

• $\Sigma(A)_{i, t} = \Sigma(A)_{i, t + 1} - 1 = \Sigma(A)_{i, t + 2} - 2$ ($i \le k_0$)
• $\Sigma(A)_{i, t} = \Sigma(A)_{i, t + 1} - 0 = \Sigma(A)_{i, t + 2} - 1$ ($k_0 < i \le k_1$)
• $\Sigma(A)_{i, t} = \Sigma(A)_{i, t + 1} - 0 = \Sigma(A)_{i, t + 2} - 0$ ($k_1 < i$)

Which you can verify $\Sigma(B)_{i, t+1} = \Sigma(A)_{i, t+1} + 1$ iff $k_0 < i \le k_1$

If $k_0 > k_1$, we have

• $\Sigma(A)_{i, t} = \Sigma(A)_{i, t + 1} - 1 = \Sigma(A)_{i, t + 2} - 2$ ($i \le k_1$)
• $\Sigma(A)_{i, t} = \Sigma(A)_{i, t + 1} - 1 = \Sigma(A)_{i, t + 2} - 1$ ($k_1 < i \le k_0$)
• $\Sigma(A)_{i, t} = \Sigma(A)_{i, t + 1} - 0 = \Sigma(A)_{i, t + 2} - 0$ ($k_0 < i$)

Which you can verify $\Sigma(B)_{i, t+1} = \Sigma(A)_{i, t+1}$ $\blacksquare$

Yes, this is a complete magic :) If anyone have good intuition for this result, please let me know in comments. (The original paper mention some group theory stuffs, but I have literally zero knowledge on group theory, and I'm also skeptical on how it helps giving intuition)

## What's next

We learned all the basic theories to tackle the problem, and obtained an algorithm for the Range LIS problem. Using all the facts, we can:

• Implement the $\boxdot$ operator in $O(N^3)$ time
• Use the $\boxdot$ operator for $O(N^2)$ time

Hence we have... $O(N^5 + Q \log N)$ time algorithm. Of course this is very slow, but in the next article we will show how to optimize this algorithm to $O(N \log^2 N + Q \log N)$. We will also briefly discuss the application of this technique.

## Next article

• +364

By ko_osaga, history, 5 weeks ago,

Yesterday I participated in a local contest involving a problem about Monge arrays. I could've wrote some d&c optimization, but I got bored of typing it so I copypasted maroonrk's SMAWK implementation to solve it. Today, I somehow got curious about the actual algorithm, so here it goes.

## 1. Definition

I assume that the reader is aware of the concept of D&C Optimization and Monge arrays.

The goal of SMAWK is to compute the row optima (ex. row minima or maxima...) in a $n \times m$ totally monotone matrix. By totally monotone, it means the following:

Def: Monotone. The matrix is monotone if the position of row optima is non-decreasing.

Def: Totally Monotone (TM). The matrix is totally monotone if every $2 \times 2$ submatrix is monotone.

Totally monotone matrices are monotone (proof easy but nontrivial), but not vice versa. The latter is a stronger condition. Note that Divide and Conquer Optimization works on a monotone matrix, therefore if you can use SMAWK, you can always use D&C Optimization, but maybe not vice versa. I think it's a niche case, though.

If you want to compute the row minima of a matrix, the TM condition holds iff for all columns $p < q$:

• If $A[i][p] > A[i][q]$ then $A[i + 1][p] > A[i + 1][q]$
• If $A[i][p] = A[i][q]$ then $A[i + 1][p] \geq A[i + 1][q]$

The takeaway is that if you took $(i, q)$ as the optimum, then $(i + 1, p)$ should not be an optimum. At first, I was very confused on the maximum and minimum, but the definition of TM is independent of them. There is absolutely no reason to be confused. On the other hand, sometimes the row optima is non-increasing. You should be careful for that case. The solution is to reverse all rows.

For all columns $p < q$, if the matrix is TM (per row minima), then if you read columns from top to the bottom you have:

• rows with $A[i][p] < A[i][q]$
• rows with $A[i][p] = A[i][q]$
• rows with $A[i][p] > A[i][q]$

On the other hand, if the matrix is Monge, $A[i][p] - A[i][q]$ is nondecreasing, therefore:

• Monge matrix is TM.
• Transpose of Monge matrix is Monge and also TM. (Transpose of TM may not TM.)
• You can compute both row optima and column optima in Monge arrays.

## 2. Algorithm

Basically, SMAWK is a combination of two independent algorithms: Reduce and Interpolate. Let's take a look at both algorithms.

### Interpolate

This is the easy one. We can solve $n = 1$ case naively, so assume $n \geq 2$. The algorithm works as follows:

• Take all odd rows and remove all even rows.
• Recursively solve for odd rows.
• Let $opt(i)$ be the optimal position for row $i$. We have $opt(2k-1) \le opt(2k) \le opt(2k+1)$, so brute-force all candidates for $[opt(2k-1), opt(2k+1)]$.

For each even rows, we need $opt(2k+1) - opt(2k-1) + 1$ entries to determine the answer. Summing this, we have $T(n, m) = O(n + m) + T(n / 2, m)$. This looks ok for $n > m$, but for $n < m$ it doesn't look like a good approach.

### Reduce

This is the harder one, but not too hard. Say that we have queries two values $A[r][u], A[r][v]$ for $u < v$. Depending on the comparison, you have the following cases:

• $A[r][u] \le A[r][v]$. In this case, $A[r][v]$ is not a candidate for row minima, and consequently for all $A[r - 1][v], A[r - 2][v], \ldots, A[1][v]$.
• $A[r][u] > A[r][v]$. In this case, $A[r][u]$ is not a candidate for row minima, and consequently for all $A[r + 1][u], A[r + 2][u], \ldots, A[n][u]$.

Let's further proceed with this information. We scan each column from left to right. For each column, we compare it's entries in the first row. We either have

• $A[1][u] \le A[1][v]$. In this case, $A[1][v]$ is not a candidate for row minima.
• $A[1][u] > A[1][v]$. In this case, the whole column $u$ is useless.

In the second case, we are very lucky — we can remove the whole column! On the other hand, the first case only rules out a single candidate. Let's maintain a stack to maintain all non-useless columns, so we store the first column and move on. If we compare the entries in the second row, we either have

• $A[2][u] \le A[2][v]$. In this case, $A[1][v], A[2][v]$ is not a candidate for row minima.
• $A[2][u] > A[2][v]$. In this case, the whole column $u$ except the first row is useless. But we declared $A[1][u]$ to be useless before, therefore the whole column $u$ is useless again.

Repeating this, we have the following very simple algorithm:

for (int i = 1; i <= m; i++) {
while (!stk.empty() && A[stk.size()][stk.back()] > A[stk.size()][i])
stk.pop_back();
if (stk.size() < n)
stk.push_back(i);
}


That's it! As a result, we can reduce the number of columns to at most the number of rows in $O(n + m)$ time.

### Putting it all together

Now we are ready to present the whole algorithm, and it's so simple:

• Reduce to make $n \geq m$ in $O(n + m)$ time.
• Interpolate to halve $n$ in $O(n)$ time.
• Recursively continue.

What is the time complexity? Except for the very first iteration of SMAWK, we can assume that the size of row and column is roughly the same. Then, we spend $O(n)$ time to halve the size of rows and columns, therefore the time complexity is $O(n + m)$.

## 3. Implementation

I implemented the above algorithm, which was actually pretty easy: It's not simpler than D&C optimization (since it's very very simple), but I thought it was pretty pleasant to implement. Then I submitted this nice faster linear-time alternatives to the template D&C optimization problems... and got Time Limit Exceeded, because:

• My implementation was not that optimized
• D&C optimization has super low constant and likely works better on random tests
• Whereas SMAWK is... well, not that constant heavy, but not so good either.

Then I just decided to copy-paste maroonrk's SMAWK implementation (I copied it from here), which I'm not sure if it's the fastest, but looks to have some constant optimizations, and was about 2x faster than my implementation. In the online judges, it seemed a little (1.5x?) faster than the D&C optimization for $N = 200000$.

But it's not faster or simpler than D&C, why should I learn it? I mean, like ad-hoc problems, you don't always do stuff because there is a particular reason to do it, so...

## 4. Conclusion

• SMAWK is simpler than I thought.
• SMAWK is faster than D&C if $N$ is near some million.
• SMAWK is slower than D&C if $N$ is near some thousand.
• If you are afraid of missing some AC because of not knowing this algorithm, probably you don't have to.
• Just like the one in practice problem, somebody can ask you to use only $4(n+m)$ matrix oracle calls or so, not all is lost...

By the way, it is well-known to compute $DP[i] = \min_{j} DP[j] + Cost(j + 1, i)$ in $O(n \log n)$ time if the cost is Monge. Normal SMAWK can't optimize this, but it seems there is a variant of SMAWK named LARSCH algorithm which computes this sort of recurrences in $O(n)$ time. I mean, just so you know...

## References

• +142

By ko_osaga, history, 5 months ago,

In today 6pm KST, I will stream solving problems related to Chordal Graphs and Tree decompositions. Stream link is here.

Even if the problem has some special structure, I will ignore it and only assume that it is a Chordal Graph or a graph with bounded treewidth.

Stream will end if someone asks me to play League together. I think it will probably last about 4 hours.

Enjoy!

• +148

By ko_osaga, history, 5 months ago,

I recently solved some problems that involved the concept of Lyndon decomposition. Honestly, most of them were too hard to understand for me. I'm just trying to think out loud about things I've read, so I can learn ideas or better takes from smarter people?

Note that I will omit almost all proofs as I can't do that. I believe all unproven claims below are facts, but it is always great to have doubts about anything.

## 1. Lyndon decomposition, definition, and algorithms

A string is called simple (or a Lyndon word), if it is strictly smaller than any of its own nontrivial suffixes. Examples of simple strings are $a, b, ab, aab, abb, abcd, abac$.

It can be shown that a string is simple, if and only if it is strictly smaller than all its nontrivial cyclic shifts. As a corollary, it can be observed that simple words are never periodic (it is not a repetition of some words for $2$ or more times).

The Lyndon decomposition of string $s$ is a factorization $s = w_1 w_2 \ldots w_k$, where all strings $w_i$ are simple, and are in non-increasing order $w_1 \geq w_2 \geq \ldots \geq w_k$.

Alternatively, the Lyndon decomposition of string $s$ can be represented as $s = w_1^{p_1} w_2^{p_2} \ldots w_k^{p_k}$. Here, $p_i$ are positive integers, and $w^p_i$ denotes the string $w$ repeated for $p_i$ times. All strings $w_i$ are simple, and are in decreasing order $w_1 > w_2 > \ldots > w_k$. The only difference is that the group of identical factors is grouped as a chunk such as $w^p_i$.

It is claimed that for any string such a factorization exists and it is unique. However, I can't prove it.

### 1.1 Algorithm

There are two algorithms that compute the Lyndon decomposition in linear time. The first algorithm is the well-known Duval algorithm. E-maxx has a good explanation on this, so I won't discuss it here.

Another algorithm is conceptually much simpler. Given a string $S$, consider the greedy algorithm that repeatedly removes the smallest suffix from $S$. By definition, the greedy algorithm always removes a simple word, so the algorithm will return a decomposition consisting of simple words. We believe that the Lyndon decomposition is unique, thus algorithm returns a Lyndon decomposition.

Let's compute the time complexity, the algorithm will iterate at most $O(N)$ times, and it can find the smallest suffix naively in $O(N^2)$ time, so the naive implementation will take $O(N^3)$ time. However, the smallest suffix is just the first entry of the suffix array, so using the fastest suffix array algorithm can optimize each phase to $O(N)$, giving an $O(N^2)$ algorithm.

Should we compute the suffix array from scratch in each phase? The removal of a suffix does change the ordering in the suffix array. For example, $abac < ac$, but $aba > a$.

However, this issue doesn't apply to our application, where we remove the smallest suffix. Therefore, given a suffix array $SA_0, \ldots, SA_{N - 1}$ for the string $S$, one can simply iterate from $SA_0$ to $SA_{N - 1}$, and cut the string as long as it is the leftmost position we encountered. As the suffix array can be solved in $O(N)$, this gives an $O(N)$ solution to the Lyndon decomposition. I can't prove why this is true. But this looks like a folklore algorithm, so I believe it's true.

## 2. Computing Lyndon decomposition for each substring

For a string of size $N$, the Lyndon decomposition may have at most $O(N)$ size, in which case the above algorithms are already optimal. Hence, in this section, we only discuss finding the smallest suffix for each substring in near-constant time, since it may

• lead to an algorithm for computing Lyndon decomposition in near-linear time on output size, by the above greedy algorithm.
• yield some small implicit structure (tree) that captures the Lyndon decomposition for all interesting substrings

### 2.1. Lyndon decomposition for all suffixes

The removal of a prefix does not change the ordering in the suffix array. To find the smallest suffix in $S[x ...]$, just find the first entry in the suffix array such that $SA_i \geq x$.

### 2.2. Lyndon decomposition for all prefixes

Duval's algorithm is basically incremental since it repeatedly adds a letter $s[j]$ to the existing structure. This hints that the Lyndon decomposition can be computed for all prefixes, although it's not entirely straightforward.

I came up with the algorithm to compute all min suffixes for all prefixes. There are other algorithms to compute the min suffixes, such as the one ecnerwala described in this comment.

Duval algorithm maintains a pre-simple string in each iteration. Consider a pre-simple string $t = ww\ldots w\overline{w}$ for the current prefix. Except for the last string $\overline{w}$, every other string are simple. And if we take the Lyndon decomposition of $\overline{w}$, the first element of it is the prefix of $\overline{w}$, which is obviously less than $w$. As we know that Lyndon decomposition is unique, we can see that the last element of Lyndon decomposition of $\overline{w}$ is exactly the smallest suffix of the current prefix.

Thus, the naive algorithm is the following:

• If $\overline{w}$ is empty, $w$ is the smallest suffix of the given prefix.
• Otherwise, the smallest suffix of $\overline{w}$ is the smallest suffix for the given prefix.

However, we don't have to recompute the smallest suffix of $\overline{w}$ every time. In the decomposition algorithm, we fix the string $s_1 = s[0 : i)$ and compute the decomposition for the suffix $s[i \ldots]$. For each relevant $i$, we use dynamic programming. Let $MinSuf[j]$ be the length of smallest suffix of $S[i \ldots j)$ for $j > i$. If $\overline{w}$ is empty the smallest suffix is $w$. Otherwise, since $\overline{w}$ is exactly the string $S[i \ldots i + |\overline{w}|)$, $MinSuf[j] = MinSuf[i + |\overline{w}|]$. Therefore we can obtain a simple recursive formula.

### 2.3 Lyndon decomposition for all substrings?

This paper contains some ideas, so if you are interested, give it a try :)

## 3. The Runs Theorem

Run is a concept that is useful for solving problems related to repeats. Even if you never heard of the name, anyone who solved some challenging suffix array problems will be familiar with it.

Given a string $S$, the tuple $(l, r, p)$ is a run of string $S$ if

• $0 \le l < r \le |S|$
• $1 \le p \le |S|$
• $r - l \geq 2p$
• $p$ is the smallest positive integer where $S[i] = S[i + p]$ holds for all $l \le i < r - p$
• The above four properties doesn't hold for tuple $(l - 1, r, p)$ and $(l, r + 1, p)$

Let $-S$ be the string where all elements are inverted: Specifically, we assign s[i] = 'a' + 'z' - s[i] for all elements of $S$, so that the usual comparison order is reverted, except the empty character which has the lowest priority.

Given a string $S$, a Lyndon prefix is the longest prefix that is a Lyndon word. Given a suffix array of $S$, this Lyndon prefix can be easily computed. Recall an algorithm that computes the Lyndon decomposition given a suffix array. Let $Rank_i$ be the inverse of the suffix array. Then, we can see that the length of the Lyndon prefix is the smallest $i$ such that $Rank_i < Rank_0$ (or $|S|$ if such does not exist). Similarly, we can also compute this for all suffixes $S[i \ldots]$: find the smallest $j > 0$ such that $Rank_{i + j} < Rank_i$.

For each suffix of $S$ and $-S$, we compute the Lyndon prefix $[i, j)$ and take them as a "seed". Start from the tuple $(i, j, j - i)$, and extend the tuple in both direction as long as $S[i] = S[i + p]$ holds. Specifically, Let $k$ be the maximum number such that $S[i, i + k) = S[j, j + k)$ and $l$ be the maximum number such that $S[i - l, i) = S[j - l, j)$. Then we obtain a run $(i - l, j + k, j - i)$. Both $k, l$ can be computed in $O(\log N)$ time with suffix arrays.

It's easy to verify that those elements are actually the run of the string. If we remove all duplicated runs, the following fact holds:

Fact 1. Those we computed are exactly the set of all Runs.

Fact 2. There are at most $n$ runs.

Fact 3. The sum of $(j - i) / p$ for all runs are at most $3n$.

Fact 4. The sum of 2-repeats ($j - i - 2p + 1$) obtained from runs are at most $n \log n$.

Fact 3 is useful when we want to enumerate all repeats. Suppose that we have to enumerate all possible repeats. A string "aaaa" can be considered as a repeat of "a" 4 times, but it is also a repeat of "aa" 2 times. In this case, we have to enumerate all multiples of $p$ — but by Fact 3, that does not affect the overall complexity.

Fact 1, 2, 3 can be found on this paper. I think Fact 4 is not hard to prove, but that doesn't mean I've done it, nor do I have a reference that states this fact.

## 4. Lexicographically minimum substring reverse

Given a string $S$, you can select $0$ or more non-overlapping substrings, and reverse them. What is the lexicographically minimum result you can obtain from the single iteration of this operation?

Let $S^R$ be the reverse of $S$. The answer is to take the Lyndon decomposition for $S^R$, and reverse each substring from that respective position.

I don't know why this works.

Intuitively, we are replacing each prefix of $S$ to the minimum suffix of $S^R$. Replacing each prefix to the minimum possible suffix seems like a good trade. Do you agree or disagree? XD

## 5. Minimal Rotation from Lyndon decomposition

Given a string $S$, what is the lexicographically minimum result you can obtain by taking a cyclic shift of $S$?

The answer can be found by finding the smallest suffix of length $> |S|$ for string $S + S$, and rotating at the respective position. This suffix can be found with Lyndon decomposition. Therefore we can solve this in $O(n)$ time, which is great.

What about just reversing a minimum suffix of $S$? Unfortunately, cases like "acabab", "dacaba" are the countercase. If we can reduce this problem into a minimum suffix instance, we can solve this problem for all prefixes, suffixes, and possibly substrings, so that's really unfortunate...

.. or maybe not. For a string $S$, consider it's Lyndon factorization $S = w_1^{p_1} w_2^{p_2} w_3^{p_3} \ldots w_k^{p_k}$. Clearly, taking the middle of periods is a bad idea. And taking only $w_k^{p_k}$ as a candidate is wrong.

Then what about trying to crack the tests? Let $SFX_j = w_j^{p_j} w_{j+1}^{p_{j + 1}} \ldots w_k^{p_k}$. Then, we can try all $SFX_j$ in range $k - 69 \le j \le k + 1$ as a candidate. It looks really hard to create an anti-test for this approach.

Lemma. Minimum rotation exists in the last $\log_2 |S|$ candidates of $SFX_j$. (Observation 6)

This provides an algorithm for computing the minimum rotation in $O(Q(n) \log n)$ time, where $Q(n)$ is time to compute the minimum suffix.

## Practice problems

### Minimum rotation for each substring

• +105

By ko_osaga, history, 7 months ago,

Hello!

XXII Open Cup. Grand Prix of Seoul will be held in 2022/07/17 Sunday, 17:00 KST (UTC+9).

The contest was used as a Day 2 Contest for ByteDance Summer Camp 2022.

Problems were authored by jh05013, molamola., jihoon, ainta, pichulia, chaeyihwan, evenharder, TLEwpdus, applist, Cauchy_Function.

Special thanks to myself for translating the statements and editorials.

Enjoy!

• +136

By ko_osaga, history, 11 months ago,

Hello!

XXII Open Cup. Grand Prix of Daejeon will be held in 2022/03/27 Sunday, 17:00 KST (UTC+9). The date of March 27 is final.

Daejeon is home to KAIST, but the contest itself has little to do with it, it just inherits the spirit of 2019 Daejeon GP.

The contest was used as a Day 2 Contest for Petrozavodsk Winter Camp 2022. I'm sorry for the camp participants over the lack of editorial. I will work to publish the full editorials right after the GP.

Problems were authored by ko_osaga, GyojunYoun, tamref, Diuven, queued_q, jh05013. Special thanks to xiaowuc1 for reviewing the statements.

For external accounts, the contest is ready now.

Note that the old opencup.ru link is not accessible now. (snarknews is trying to find servers outside of Russia.)

List of relevant previous contests:

Enjoy!

• +100

By ko_osaga, history, 16 months ago,

Update (2021.10.28): Editorial, Division 1 Gym, Division 2 Gym are prepared.

Hello!

For external accounts, the contest is ready now.

List of relevant previous contests:

Enjoy!

• +188

By ko_osaga, history, 18 months ago,

Since hmehta didn't wrote anything..

For easy, I spend eternity to realize that every cards starts with their face down. I have so many things to talk about easy, but at this point, it seems worthless.

• +122

By ko_osaga, history, 20 months ago,

Will there be a mirror in the near future?

• +87

By ko_osaga, history, 23 months ago,

TL;DR: IOI 2021 was planned to held on-site with strong safety measures. Today, IC announced to turn it into an on-line contest (I guess due to travel difficulties). The IC is exploring the possibility of an optional on-site contest.

Dear Friends of IOI,

I hope you are doing well as COVID-19 is still rampaging all over the world. But as vaccines are becoming available, I hope we can all soon get back to our normal life before the pandemic.

The IC held the Winter meeting in late February. We have the following important information regarding IOI 2021 to share with the community.

First, IOI 2021, organized by Singapore, will still be an online competition much like the previous year. The competition week will fall between mid to late June.

Second, competition aside, in an effort to bring back some normalcy, IOI business will be conducted as usual. This includes collection of registration fees and election of new committee members.

Third, the host and the IC are still exploring possibilities to socially host some teams who can and are willing to travel to Singapore, subject to various Air Travel requirements and COVID-19 safe management measures. Such teams would still sit the contest online from within Singapore, using their own computers. Detailed plans will be announced by the host as they become available.

I hope this information will allow you to start making plans for selecting teams to participate in IOI 2021. The IC and the host team will continue to held online meetings leading up to the IOI in June. We will keep you all updated as things develop further. If you have any questions, please contact the IOI Secretariat at secretary.ioinformatics@gmail.com.

Stay safe and best wishes, Greg Lee IOI President

• +271

By ko_osaga, history, 23 months ago,

Hello!

I uploaded 2020 Petrozavodsk Summer Camp, Korean Contest to the CF Gym. It is a collection of Korean problems per the request of snarknews.

Problems are collected from:

• UCPC 2020 (Local ICPC Contest. 2019 version was used in XX Open Cup. GP of Korea)
• Semi-Game Cup (Contest authored by Seoul Science High School students. YeongTree is selected to IOI 2021 Korea team)
• IOI 2020 Korean TST (Problem B)
• Random educational problem from rkm0959

Problems are authored by:

And unfortunately there are no editorials.

List of relevant previous contests:

Enjoy!

• +81

By ko_osaga, history, 2 years ago,

Hello! I'm happy to announce XXI Open Cup. Grand Prix of Suwon. Gyeonggi Science High School is located in Suwon, Korea.

This contest was used as a Day 9 Contest from Petrozavodsk Winter 2021 Camp.

List of relevant previous contests:

Enjoy!

• +155

By ko_osaga, history, 2 years ago,

Hi!

Tomorrow at 22:00 KST I will stream solving New Year Prime Contest 2021. It is an ongoing contest, but Prime Contest is special, so I think it's okay to stream.

My goal is to implement the task "Gardening Game". gs18115 said it is very interesting. Let's give it a try!

If I have time left, I will try to implement "Evacuation" with SMAWK.

The stream will end if Prime Contest 2021 ends.

See you!

• +226

By ko_osaga, history, 2 years ago,

Hi! Tomorrow at 21:00 KST I will stream solving judge.yosupo.jp. In the stream, I will try to implement Edmond's Directed MST algorithm with this lecture note.

I will solve the following problems in the stream. Recommendations are welcome, preferably ones that's not just "Find Directed MST".

Since this is not a regular data structure stream, I will keep it short. The stream will last about 3 hours.

This event isn't that well-prepared like others, please don't expect too much :)

Thanks!

• +117

By ko_osaga, history, 2 years ago,

Hello! I'm happy to announce XXI Open Cup. Grand Prix of Korea.

Special thanks to xiaowuc1 for revising our English.

List of relevant previous contests:

Enjoy!

• +279

By ko_osaga, history, 3 years ago,

Thanks to vintage_Vlad_Makeev for the information.

According to Wikipedia, RP is a class of decision problem which admits a randomized polynomial-time algorithm such that:

• If the correct answer is NO, it always returns NO
• If the correct answer is YES, then it returns YES with probability at least 1/2 (otherwise, it returns NO).

The Amazing Power of Randomness: NP=RP authored by Andras Farago claims that NP=RP. This means, there is a randomized polynomial time solution to NP problems, such as:

• 3-SAT
• Traveling Salesperson Problem
• Minimum Vertex Cover
• Graph Coloring
• Among others

What does it mean? Is the paper wrong? Should we start studying randomized algorithm instead of machine learning? Will all cryptographic system collapse?

• +769

By ko_osaga, history, 3 years ago,

Currently, Jury archives for NERC 2019 is missing, unlike the other years.

Can anyone look into those issues and update the website?

• +35

By ko_osaga, history, 3 years ago,

In 2020/05/24 14:00 KST, I will stream solving 全国統一プログラミング王決定戦本戦.

I'll also experiment with my new tablets, so there will be some explanation of my thinking process.

Even if you can't read Japanese, don't worry. I can't read it either, and I'll have a hard time with the translator too.

The stream will end if I solve everything or if it's too late.

Stream link is https://www.twitch.tv/gs14004 as usual.

See you!

• +33

By ko_osaga, history, 3 years ago,

I uploaded 2020 Petrozavodsk Winter Camp, Jagiellonian U Contest to the CF Gym.

The problemset was used in 2020 Petrozavodsk Winter Camp, and North America Programming Camp 2020. Both ghosts are in scoreboard, so it's a good opportunity to test your skills.

In the version used in PtzCamp and NAPC, problem K had a weak test and the model solution failed in my simple handcrafted tests. I just added that test, and changed the problem limit leniently to allow model solution to pass. I guess this made the problem much easier now.

Thanks to the problemsetters (I don't know who are them). Enjoy!

Editorial

• +77

By ko_osaga, history, 3 years ago,

You are given a graph $G$, and for each vertex $v$ you have to assign a positive integer color such that every adjacent pair of vertices (vertices directly connected by edge) have different color assigned. You have to minimize the maximum color assigned: In other words, you have to minimize the number of distinct assigned colors.

But that's graph coloring (vertex coloring). It's hard. Ok, one more time:

You are given a graph $G$, and for each edge $e$ you have to assign a positive integer color such that every adjacent pair of edges (edges sharing same endpoints) have different color assigned. You have to minimize the maximum color assigned: In other words, you have to minimize the number of distinct assigned colors.

Note that this can be interpreted as "graph coloring (vertex coloring) of line graphs". In that sense, you can consider in a similar spirit to "graph coloring of interval graphs", "graph coloring of permutation graphs", blah-blah.

## General graph

Let $D = max_{v \in V(G)}deg(v)$ be the maximum degree of a graph. Since every vertex should be incident to edges with different colors, the edges can't be colored with less than $D$ colors.

So if we somehow find the way to color the edges with $D$ colors, the case is closed. This is obviously not true: Consider a 3-cycle, then $D = 2$, but you need 3 colors.

Still, Vizing discovered that this is approximately true:

• Theorem (Vizing, 1964). Every simple graph can be edge-colored with $D+1$ colors.
• Proof: If you are interested then see here.

Note that:

• Theorem doesn't hold if the graph has multiple edges connecting same vertices.
• It's NP-hard to determine if the graph can be colored with $D$ colors, so optimal coloring is still NP-hard.

Wikipedia discusses the $O(nm)$ algorithm to find a $D+1$ edge coloring of simple graph, which you can find the implementation here. Sometimes, people asks you to solve just that exact same problem, and sometimes you can just cheese problems without thinking too much. Anyway, this is not our main point. let's jump to more interesting part.

## Bipartite Graphs

To find an optimal edge coloring, we have to prove that the edges can be colored with $D$ colors. In general graph this was a little (one color!) bit of fail, but as the countercase had an odd cycle, maybe in bipartite graph this is true?

• Theorem. Every bipartite graph can be edge-colored with $D$ colors.

• Proof. We use induction on $D$.

• If $D = 0$, the proposition is trivial.

• Otherwise, we will partition the graph $G$ with a matching $M$ and a graph $G\setminus M$ with maximum degree $D - 1$. We transform the graph $G$ in such a way that the left/right bipartition have equal number of vertices (just add dummy nodes of degree 0), and every node have degree $D$ (repeatedly pick two nodes with degree $<D$ and add a dummy edge. If this procedure fails, then two bipartition have different sum of degrees, which is impossible).

Let $L, R$ be a bipartition of this new graph. Now we assume every node of $G$ have degree $D$. Suppose there is no perfect matching in $G$. Then by Hall's theorem, there exists a subset of vertices $S \subseteq L$ such that $|N(S)| < |S|$. Then there exists $|S| \times D$ edges connecting between $N(S)$ and $S$. On the other hand, there can't exist more than $|N(S)| \times D$ edges incident to $N(S)$. Contradiction. Thus the perfect matching exists.

Now remove the dummy edges from the perfect matching. Note that the dummy edges don't connect any node which originally had degree $D$. So the edges left from the matching still covers all nodes with degree $D$. Denote this matching $M$. Then $G \setminus M$ have maximum degree $D - 1$. Color the edges in $M$ with color $D$, and color $G \setminus M$ by inductive hypothesis.

• Remark. Theorem still holds when the graph has multiple edges.

This also gives an algorithm to find a edge coloring of bipartite graphs. We just want to find any matching that covers all nodes with maximum degree $D$. If all nodes have degree $D$, then simple bipartite matching works. In general case, since we have to cover some nodes, we have to model this as flow and enforce some edges to have capacity 1. This can be modeled with maximum flow with demands. In any case, if you use Dinic's algorithm or Hopcroft/Karp, time complexity is $O(D \times MN^{0.5})$.

So that's great, we can optimally solve edge coloring of bipartite graph in polynomial time. On the other hand, using matching or even flow with demands is pretty tiring and complicated, can we get rid of flows at all?

There is an implementation of edge coloring in $O(NM)$ with short code and good constant factor, which doesn't rely on flow argument. You can read about this algorithm here (problem F). Here is the implementation I use, which is copied from waterfalls' submission here.

## Faster

Can we do this faster? Let's go back to Dinic's algorithm. Here's a simple yet elegant trick to reduce the time complexity from $O(D \times MN^{0.5})$ to $O(MN^{0.5} \log D)$.

The idea is, as you can guess from the $\log$ factor, divide-and-conquer. If $D$ is odd, you can just use the naive method to reduce the degree to $D - 1$. If $D$ is even, it partitions the graph into two pieces with maximum degree $\frac{D}{2}$. But how?

Let's revisit the following well-known fact related to even degrees:

• Lemma (Euler Circuit). For connected graphs where every node has even degree, there is a circuit which visits every edges exactly once.

And also, there exists an algorithm to compute the Euler circuits in linear time.

So now, let's add some dummy edges again to connect the vertices with odd degree. Obviously, there exists even number of vertices with odd degree, so we can add any kind of matchings between them. Now, every connected component has an Euler circuit, so we can compute them for linear time.

Now, note that the Euler circuit obviously have even length in bipartite graphs, let's denote the circuit as $C = {e_1, e_2, \ldots, e_{2k}}$. Let's color the odd-numbered edges ($e_1, e_3, e_5 \ldots$) as blue, and even-numbered edges as red. Then, for each vertex $v$, we can see it is adjacent to $\frac{deg(v)}{2}$ blue nodes and $\frac{deg(v)}{2}$ red nodes! This is because, if the circuit entered the vertex with some color, it leaves the vertex with different colors.

Thus, just take the Euler circuit for each component, and divide the edges into blue and red sets. Remove dummy edges, and you still have max degree $\frac{D}{2}$ for each color. So you can recurse from that point! Since every edges are considered at at most $O(\log D)$ recursion stage, and they contribute to at most $N^{0.5}$ overhead, the time complexity is $O(MN^{0.5} \log D)$.

## More faster!!

Now let's look for something genuinely fast: Fast enough that it doesn't involve any flow and have $O(M\log M \log D)$ solution!

Going back to the original proof, we know that it is easier to assume that the graph is regular: Every node has degree of $D$. Naive strategy of adding edges, as described in proof, adds too much edges and is inefficient. But we can improve it by "merging" small-degree nodes: If two node have sum of degree at most $D$, then we can merge(contract) two nodes, and any coloring in this new graph still works in the original one.

So, for each side of biparitition, while there is at least 2 nodes with $deg(v) \le \frac{D}{2}$, find them and contract it. After the end of this procedure, we are left with at most 2 nodes (one for each bipartition) with $deg(v) \le \frac{D}{2}$. Now, even if we apply the naive way of adding dummy nodes and edges, we end up adding at most $O(cM)$ edges where $c \le 3$ in my naive calculation.

Now let's assume the graph is regular. Thanks to regularity, we don't have to find flows anymore but can simply calculate a perfect matching in this regular graph. It doesn't look easy to find this faster than $O(M^{1.5})$, but there is an algorithm to compute a perfect matching of regular graph in $O(M \log M)$, which was discovered in 2010: https://arxiv.org/pdf/0909.3346.pdf

To briefly explain what's going on, the algorithm is actually not very different from finding a bipartite matching with flows. We construct a usual flow graph with source and sinks, and find an augmenting path. There are just two notable difference:

• If the edge is in matching, rather than reversing it, we just contract two endpoint of such edge. You can see this doesn't make the situation different.
• We don't use DFS to find path: We use random walk from source, which means we will sample any outgoing edges with random probability, and halt until it reaches the sink.

Intuitively, the second heuristic will run very fast in the initial stage of augmenting path finding. At the first stage the length of path will be length 3, and it will remain so until a certain period. Random walk procedure is much faster than DFS in such case, because it's time complexity is just it's walk length. Now, the paper claims the following:

• Lemma 6 (from paper). If we found $m < n$ matchings in a current graph, then the expected number of steps taken by the random walk is at most $2 + \frac{n}{n - m}$.

Then, we can simply do this for all $0 \le m \le n - 1$, and get $2n + n \sum_{i=1}^{n}\frac{1}{i} = O(n \log n)$!

Lastly, note that this problem can be solved in $O(M \log M)$ time. Interested readers can find it in the second reference link.

## Challenge

The $O(mn)$ algorithm is very simple to implement and it usually don't find long Kempe chains. I doubt that is true for any fast algorithm I mentioned. Plus, it won't be cache oblivious. Can anyone implement any $o(mn)$ algorithm for bipartite edge coloring which runs faster than $O(mn)$ algorithm?

## Practice problem

I added some practice problem for this topic. Some of them are directly related to my article, some of them checks your creativity at reducing different problems to bipartite edge coloring.

## References

Also special thanks to 300iq for introducing me the paper for perfect matching.

• +361

By ko_osaga, history, 3 years ago,

From Sunday 14:00 UTC-4 Eastern Daylight Time, I will stream solving problems in Chinese OJ. Specifically, I'll try to

• Read these two papers (paper1 paper2) and try to learn them
• Write it in Chinese OJ (problem1 problem2). Huge thanks to TLE for making this stream more authentic.

Stream link is https://www.twitch.tv/gs14004. Stream will be in English.

I'm not sure about the exact plan, but I'll stream at least 4 hours.

See you!

• +151

By ko_osaga, 3 years ago,

About 4 hours later (2020/01/27 20:30 KST), I will stream solving random OI problems.

Things I might try solving:

The stream will end if I get sleepy.

Stream link is https://www.twitch.tv/gs14004 as usual.

See you!

• +39

By ko_osaga, history, 3 years ago,

Since the Polygon tutorial system is currently broken, I will replace the editorial with PDF format. Sorry for the inconvenience!

Solution PDF

Problem A was authored by ko_osaga. Code

Problem B was authored by ko_osaga. Code

Problem C was authored by ko_osaga. Code

Problem D was authored by nong. Code

Problem E was authored by ckw1140. Code

Problem F was authored by ko_osaga. Code

Problem G was authored by ko_osaga. Code

Tutorial of Hello 2020

• +191

By ko_osaga, 3 years ago,

새해 복 많이 받으세요, 코드포스! (Happy new year, Codeforces!)

Welcome to the first Codeforces Round of the new decade, Hello 2020! The round will be held on Jan/04/2020 15:05 MSK.

• Div 1, 2 combined
• 2.5 hours!
• 7 problems!
• Score distribution: 500-1250-1750-2500-2750-4000-4000
• Yes, it is rated!

This round is prepared by ko_osaga nong ckw1140. I am personally very thrilled to deliver my first Codeforces contest as such a memorable one!

More credits for the contest:

UPD: Editorial. Thank you for your participation!

UPD2: Winners:

Announcement of Hello 2020

• +1231