### CristianoPenaldo's blog

By CristianoPenaldo, history, 6 months ago,

I was using the Berlekamp-Massey (BM) algo for yesterday's H, but my sol worked too slow. Here is my idea:

(1) Divide $[1, n]$ into scales. Let $S(scale)$ be the scale $scale$, which is $S(scale)$ is $\{x | x \times scale \geq n, x \times scale/2 < n\}$. For example, if $n=7$, scale $1$ is $[7, 7]$, scale $2$ is $[4, 6]$, scale $4$ is $[2, 3]$, scale $8$ is $[1, 1]$. It is guaranteed that $scale$ is a power of $2$ in my sol.

(2)Fetch $O(log scale)$ consecutive items for each scale using the matrix fast power algorithm. The matrix is constructed in step (3) from the last scale.

(3)Using the BM algorithm to build a recurrence of length $O(log\,scale)$ and a recurrence matrix $M \in \mathbb{Z}/p\mathbb{Z}^{O(log\,scale) \times O(log\,scale)}$.

I mean for each scale we fetch some items $I$, get a recurrence using the BM algorithm, and construct a recurrence matrix $M$ using that recurrence, for example $a_r = a_{r+1} + a_{r+2}$, then

The matrix $M$ and the items $I$, and the fast matrix power algorithm, are used to fetching items for the next scale. This is a coarse-to-fine algorithm. The problem is that, for each scale, I need to fetch $O(log\,scale)$ items and each item costs $O(log\,scale^3log\,n)$ using a fast matrix power algorithm (matrix multiplication is $O(log\,scale^3)$, and there are up to $O(n)$ elements for each scale. Therefore the complexity for each scale is $O(log\,scale ^ 4 log\,n)$ and the overall complexity is $\sum\limits_{scale} O(log\,scale ^ 4 log\,n) = O((logn)^6)$ for each test case which is too slow. Any idea to speed up?

Code:

Spoiler
• -3

By CristianoPenaldo, history, 12 months ago,

I am serious, not a troll.

• +17

By CristianoPenaldo, history, 13 months ago,

I admit that AGC061C is too hard for me, and I don't even understand its official editorial. Today I see a Luogu blog with a very genius and clear idea on this problem. I learned a lot from it, and I would like to share it to you now. The writer of this blog is possibly Little09, but I am not quite sure.

Part1: Problem Statement and Constraints

There are $N$ customers named $1$, ..., $N$ visiting a shop. Customer $i$ arrives at time $A_i$ and leaves at time $B_i$ The queue order is first in first out, so $A_i$ and $B_i$ are both increasing. Additionally, all $A_i$ and $B_i$ are pairwise distinct.

At the entrance, there's a list of visitors to put their names in. Each customer will write down their name next in the list exactly once, either when they arrive or when they leave. How many different orders of names can there be in the end? Find the count modulo $998244353$.

Constraints:

$\cdot 1 \leq N \leq 500000$.

$\cdot 1 \leq A_i \leq B_i \leq 2N$.

$\cdot A_i < A_{i+1}\,(1 \leq i \leq N-1)$.

$\cdot B_i < B_{i+1}\,(1 \leq i \leq N-1)$.

$\cdot A_i \neq B_j \,(i \neq j)$.

$\cdot \text{All values in the input are integers}$.

Part2: Idea of the blog

The complexity of the algorithm in the blog is $O(N)$.

Consider two dynamic programming tables named $f$ and $g$ ($dp$ in the original blog). The blog computes $f,\,g$ in the reverse order. Formally,

$f(i)\,(1 \leq i \leq n)$: The number of permutations when we have already considered persons $i,\,i+1,\,...,\,n$.

$g(i)\,(1 \leq i \leq n)$: The number of permutations when we have already considered persons $i,\,i+1,\,...,\,n$, and the person $i$ is forced to sign her (or his) name on $B_i$.

The blog defines $c(i)$ as:

$c(i) := max\{j|A_j < B_i\}$. (1)

Obviously $i \in \{j|A_j < B_i\}$ and $c(i) \geq i$.

First, we consider $f(i)$. If $i$ signs on $A_i$, then no matter when $i+1,\,i+2,\,...\,n$ sign their names, $i$ is always placed in front of them. It is a bijection, given a permutation $p$ of $[i+1,\,i+2\,...\,n]$, just add $i$ in the front of $p$. The number of distinct permutations when $i$ signs on $A_i$ is just $f(i+1)$. If $i$ signs on $B_i$, then $i$ must not be placed the first among $\{i,\,i+1,\,...,\,n\}$, otherwise it is overlapped with a situation where $i$ signs on $A_i$. Now we consider the number of permutations that $i$ is placed the first among $\{i,\,i+1,\,...,\,n\}$. In this case ($i$ is the first), $x \in \{i+1,\,i+2\,...,\,c(i)\}$ should all sign on $B_x$ because $A_x < B_i$ (according to the definition of $c$ in Eq.(1)), and the choices of $x \in \{c(i)+1,\,...,\,n\}$ are not important. The number of the permutations that $i$ signs on $B_i$ and $i$ is placed the first among $\{i,\,i+1,\,...,\,n\}$ is $g(c(i))$, because $c(i)$ must chooses $B_{c(i)}$ and $\{c(i)+1,\,c(i)+2\,...,\,n\}$ can choose freely. Therefore, the situation where $i$ signed on $B_i$ contributes $g(i) - g(c(i))$ to $f(i)$. $f(i)$ depends on $g(i)$:

$f(i) = f(i+1) + g(i) - g(c(i))$. (2)

Second, we consider $g(i)$. To count $g(i)$, an important idea is to divide it into non-overlapping cases. The blog consider

$P_j$ := The set of permutations where $j$ is the least integer among $\{i+1,\,i+2\,...,\,c(i)\}$ such that $j$ chooses $B_j$. In other words, $i$ chooses $B_i$, $\{i+1,\,i+2\,...,\,j-1\}$ all choose $A$, and $j$ chooses $B_j$. Also, we should not forget $P_{c(i)+1}$, because $\{i+1,\,i+2\,...,\,c(i)\}$ might all choose $A$!

(1)Do these $P_j$ (including $P_{c(i)+1}$) overlap? No! Because the permutations in $P_j$ has a unique prefix: $[i+1,\,i+2\,...,\,j-1]$.

(2)Do these $P_j$ cover all situations? Yes, this is obvious.

(3)How to calculate the size of $P_j$, i.e., $|P_j|$? For $j \leq c(i)$, since the prefix is fixed ($[i+1\,i+2\,...,\,j-1]$ mentioned in (1)), there is a bijection from the set $g(j)$ to $P_j$ by adding the fixed prefix to the element in $g(j)$. So the size of $|P_j|$ is just $g(j)$. For $j=c(i)+1$, $|P_j|=f(c(i)+1)$ because all $\{i+1,\,i+2\,...,\,c(i)\}$ have to choose $A$.

Therefore,

$g(i) = \sum\limits_{j=i+1}^{c(i)} g(j) + f(c(i)+1)$ (3)

Although we consider $f$ first, we should calculate $g$ first because $f(i)$ depends on $g(i)$.

Part3: Implementation details

(1) You can use two pointers to calculate $c$.

(2) To make the computation of $g$ faster, you can use the prefix sum trick.

(3) The original blog considers two cases: $c(i)=i$ and $c(i) > i$. They can be unified into equations (2) and (3), i.e., it is not necessary to consider two cases.

(4) In the original blog Little09 calculates from $n$ to $1$. You can also calculate from $1$ to $n$. Symmetrically, you should define $c(i)$ as $min\{j|B_j > A_i\}$, and define $g(i)$ as "The number of permutations when we have already considered persons $1,\,2,\,...,\,i-1$, and the person $i$ is forced to sign her (or his) name on $A_i$". Here is my implementation that calculates from $1$ to $n$.

Part4: What I have learned

(1) The most important idea in counting is dividing into non-overlapping sets whose unions are the whole set. Besides, the counting of each set should be much simpler.

(2) The most important thing of dp is defining states.

(3) Bijections are really important in combinatorics. For example, the Dyck path.

(4) Achieving (1) and (2) requires high IQ and talent. For example, the definition of $f$ in the blog is normal, while the definition of $g$ is genius!

Part5: The last

If you have any questions, please don't hesitate to ask me. It is beneficial for both of us. For you, I might resolve your confusion. For me, I can check whether I make a typo in the blog or not and whether I really understand this blog or not.

• +58

By CristianoPenaldo, history, 14 months ago,

Sorry for this shit blog. I am quite emotional now. I admit I am stupid!!!

Still only solve 4 problems in ABC284:

1. RE in problem A because I select the Python interpreter instead of G++.
2. Stuck at D because of an unknown bug of my Pollard Rho template, and stuck it again because of a buggy binary search template.
3. Stuck at E.
4. Get idea for F, pass 2 minutes after the contest. I use string hash: https://atcoder.jp/contests/abc284/submissions/37848937

Please comfort me or just tell me I am stupid. I admit that I am stupid and behave like a so-called giant baby but I really need comfort, thank you for your help!

• +27

By CristianoPenaldo, history, 15 months ago,

Part1: Introduction

Atcoder Beginner Contest 279 is good for learning generating functions (GFs). GFs could solve two of them (G and Ex). The English version tutorial of 279Ex has already been posted on the MathStackExchange by a kind person. Unfortunately, the English tutorial on 279G is not available. Now I would like to make a second-hand tutorial on 279G based on PCTprobability's idea. I spend a huge amount of time understanding this idea. For contestants at about my level, it is quite difficult to understand
the idea even if it is written in English, let alone it is written in Japanese only (My Japanese is N5 level). I will make the following contributions.

(1)Write the tutorial in English. The original tutorial is in Japanese only.

(2)Fill in the details. I believe you can understand my words.

(3)Offer an accepted implementation.

I have to state that, using generating function is definitely not the best way to solve this problem. It could be solved much simpler by using dynamic programming with monotone deque optimization. However, this problem is also a good chance to learn generating functions (GFs). Generating functions encode the information of sequences in a continuous way. If you do not know anything about GFs, I suggest you read:

(1) How to use GFs to solve recurrence? Link.

(2) How to prove the Vandemonde convolution identity using GFs? Link.

(3) And most relevant to this problem, how to solve partitions using GFs? Link.

The most important notation $[x^k]f(x)$ denotes the coefficient of $x^k$ in function $f(x)$. For example, $[x^2](x^3+2x^2+1)=2$. And $[x^2]\frac{1}{1-2x}=4$, because we can expand $\frac{1}{1-2x}$ to $\sum\limits_{i=0}^\infty (2x)^i$ in the region of convergence $(-\frac{1}{2}, \frac{1}{2})$. So the coefficient of $x^2$ is $4$.

Part2: Problem Statement

The problem says: There is a grid with $1×N$ squares, numbered $1,2,…,N$ from left to right.

Takahashi prepared paints of $C$ colors and painted each square in one of the C colors. Then, there were at most two colors in any consecutive K squares. Formally, for every integer $i$ such that $1≤i≤N−K+1$, there were at most two colors in squares $i,i+1,…,i+K−1$.

In how many ways could Takahashi paint the squares? Since this number can be enormous, find it modulo $998244353$.

$\cdot \text{All inputs are integers.}$

$\cdot 2 \leq K \leq N \leq 10^6$.

$\cdot 1 \leq C \leq 10^9$.

Test Case $1$: $N=K=C=3$. In this input, we have a $1×3$ grid. Among the $27$ ways to paint the squares, there are $6$ ways to paint all squares in different colors, and the remaining $21$ ways are such that there are at most two colors in any consecutive three squares.

Test Case $2$: $N=10, K=5, C=2$: Print $1024$.

Test Case $3$: $N=998, K=244, C=353$: Print $952364159$.

Part3: Idea

(1) What are GFs good at? Gf is good at solving partitions, for example, the Pentagonal number theorem. So, the first step is to compress the colors by Run-Length Encoding (RLE). For example, if the colors are $(1,1,1,2,2,3,2,2)$, then they are uniquely compressed to $((1, 3), (2, 2), (3, 1), (2, 2))$. With RLE, $[1, n]$ is partitioned into $l$ segments with different colors. Let me denote these segments as $S_1, S_2, ..., S_l$. Please note that adjacent segments must be painted with different colors, otherwise you can merge the adjacent segments into one, which violates the definition of RLE. The idea to divide the interval into segments for counting also appears in Pinely Round Problem D.

(2)Consider $2 \leq i \leq l-1$. If $|S_i| \leq K-2$, then $S_{i+1}$ only has one choice: Paint it with the same color as $S_{i-1}$. Otherwise, the last element of $S_{i-1}$, the whole segment $S_{i}$ and the first element of $S_{i+1}$ will form an interval with size $\leq K$ and three colors, violating the rule. If $|S_i| > K-2$, then $S_{i+1}$ has $C-1$ choices. It is only required that the color of $S_{i+1}$ is different from that of $S_{i}$.

(3)I claim that the generating function for segmentation of length $l \geq 2$ is:

$f(x, l) := C(C-1)(\sum\limits_{j=1}^\infty x^j)^2(\sum\limits_{j=1}^{K-2}x^j + \sum\limits_{j=K-1}^\infty (C-1)x^j)^{l-2} \tag{1}$.

The $C$ is because the first segment has $C$ choices.

The $C-1$ is because the second segment always has $C-1$ choices.

The $(\sum\limits_{j=1}^{K-2}x^j + \sum\limits_{j=K-1}^\infty (C-1)x^j)^{l-2}$ contain two parts: $x^j$ encodes the length of segment $i$ ($2 \leq i \leq i-1$). $1$, and $C-1$ encode the transfer contribution between $i \rightarrow i+1$. If $len(S_i) \leq K-2$, then $S_{i+1}$ has only 1 choice (See (2)). Otherwise, $S_{i+1}$ has $C-1$ choices.

Now we have dealt with the length of $S_i (2 \leq i \leq l-1)$ and the transfer contribution between $i$ and $i+1$ ($2 \leq i \leq l-1$). But we still omit two things: The length of head and tail! So we encode each of them with $\sum\limits_{j=1}^\infty x^j$. See the below picture:

$\sum\limits_{j=1}^\infty x^j=\frac{x}{1-x} \tag{2}$.

$\sum\limits_{j=1}^{K-2}x^j + \sum\limits_{j=K-1}^\infty (C-1)x^j = \frac{x-x^{K-1}}{1-x} + (C-1)\frac{x^{K-1}}{1-x} = \frac{x+(C-2)x^{K-1}}{1-x} \tag{3}$.

And put $(1), (2), (3)$ together, $f(x, l)=C(C-1)\frac{x^2}{(1-x)^2}(\frac{x+(C-2)x^{K-1}}{1-x})^{l-2} \tag{4}$.

Here you might get confused: Will sum to infinite, e.g., $\sum\limits_{j=1}^\infty x^j$, lead to overflow (i.e., contain terms whose orders are higher than $x^N$)? The answer is YES, but we are not afraid of it. This answer refers to a core idea of GFs: I only care about $[x^N]f(x, l)$, because the sum of length of segments should be $N$. For the terms higher than $x^N$, I don't fxxking care about it!!! It is quite magic that, including more terms may decrease the computational complexity! If you are curious about it, you might read the English tutorial of 279Ex carefully. That is because summing more terms might lead to closed-form expressions which are easier to compute. However, we should be cautious about the low terms. For example, If we mistake $\sum\limits_{j=1}^\infty x^j$ for $\sum\limits_{j=0}^\infty x^j$, we will get into big trouble as we count the contribution of "empty segments" that should not be counted. So, for GF methods, overflowing parts are not important, but we should be careful about the correctness of the non-overflowing parts, especially the low-order terms.

Now, to take all possible length $l \geq 2$ into consideration, we sum $f(x, l)$ over $l$:

$g(x) := \sum\limits_{l=2}^\infty f(x, l) = \sum\limits_{l=2}^\infty C(C-1)\frac{x^2}{(1-x)^2}(\frac{x+(C-2)x^{K-1}}{1-x})^{l-2} \tag{5}$

And,

$\sum\limits_{l=0}^\infty(\frac{x+(C-2)x^{K-1}}{1-x})^{l} = \frac{1}{1-\frac{x+(C-2)x^{K-1}}{1-x}} = \frac{1-x}{1-2x-(C-2)x^{K-1}} \tag{6}$

Combining (5), (6):

$g(x) = \frac{C(C-1)x^2}{(1-x)(1-2x-(C-2)x^{K-1})} \tag{7}$.

The denominator of $g(x)$ is $(C-2)x^K - (C-2)x^{K-1} + 2x^2 - 3x + 1$. We care about $[x^N]g(x) = [x^{N-2}]\frac{C(C-1)}{(C-2)x^K - (C-2)x^{K-1} + 2x^2 - 3x + 1}$. The thing we care about is "polynomial inversion", which could be computed using FFT/NTT in $O(NlogN)$ time. Here is a useful summation: Operations on Formal Power Series. Formally, if you want to compute $[x^k]\frac{1}{A(x)}$, we need to find a $B(x) (deg(B(x)) \leq k)$ such that $A(x)B(x) \equiv 1 (\mod x^{k+1})$, and $[x^k]\frac{1}{A(x)} = [x^k]B(x)$.

Core code (with explanation):

int main(void){
int n, k, c; cin >> n >> k >> c;
poly<998244353> x, xinv; //define two polynomials
x.a.resize(n-1); //We want to get [x^{N-2}](1/((C-2)x^K - (C-2)x^{K-1} + 2x^2 - 3x + 1)), so the inversion should be set to n-1
if(n-1 > 0) x.a[0] += 1;
if(n-1 > 1) x.a[1] -= 3;
if(n-1 > 2) x.a[2] += 2;
if(n-1 > k-1) x.a[k-1] -= poly<998244353>::mint(c-2);
if(n-1 > k) x.a[k] += poly<998244353>::mint(c-2); //remember to use += instead of =, as k and/or k-1 may equal to 0, 1, 2
xinv = x.inverse(n-1); //Inverse!
cout << (c + ((1ll*c*(c-1))%998244353)*xinv[n-2]())%998244353 << "\n"; //Care about integer overflow! Remember to add c for the case where l==1.
}

• +77

By CristianoPenaldo, history, 15 months ago,

Details will be updated next week. I am a little busy now, so please do not downvote me. If you are interested or confused, you can comment or chat with me.

The idea is from official. This blog is about the implementation.

My submission: submission.

My code here:

#include<bits/stdc++.h>
using namespace std;
#define BIT(x, i) (((x) & (1 << (i))) >> (i))
#define DEBUG 0
#define DEBUG_INTERACTIVE 0

constexpr int C = 11, MAXN=2005; //Since N <= 2000, nim number <= 2000, so we only need to scan 1<<0 to 1<<11.
int n, l, r;
int sg[MAXN]; //sg number for 1-n
int f[MAXN][MAXN];
//f[i][j] If we want to get sg number j from [1, i], where should I cut?
//For example, if sg(i) = 4 and get sg number 3 after cutting [j, k], then f[i][3] = j;

int nim(const vector<int>& v, int* index, int* value_after_change){
//input:
//v: A vector of stones.
//index and value_after_change: If the XOR sum of v is 0, then index remains unchanged. we modify v[*index]
//to *value_after_change to guarantee that the XOR sum of v after modification is 0.
//Output:
//The XOR sum of v (current v)

//Example1:
//vector<int> v = {1, 4, 3};
//int index, value_after_change;
//int value = nim(v, &index, &value_after_change);
//cout << value << " " << index << " " << value_after_change << "\n"; //6 1 2
//value = 1^4^3 = 6;
//change 4->2, 1^2^3=0

//Example2:
//vector<int> v = {1, 2, 3};
//int index, value_after_change;
//int value = nim(v, &index, &value_after_change);
//cout << value << " " << index << " " << value_after_change << "\n"; //0 32763 -1204770170
//1^2^3 = 0. You cannot modify any number, then index and value_after_change are uninitialized random numbers (may not be 32763 -1204770170)

int sz = (int)v.size();
vector<int> ones[C+1];
int res = 0;
for(int b = 0; b <= C; ++b){
for(int i = 0; i < sz; ++i){
if(BIT(v[i], b)){
ones[b].push_back(i);
}
if(!b) res ^= v[i]; //Calculate the XOR sum, also known as the Nim sum. Remember only xor once.
}
}
if(res == 0) return res;
bool flipped = false;
*value_after_change = 0;
int choosed = -1; //FIX
for(int b = C; b >= 0; --b){
if(BIT(res, b)){
if(!flipped){
choosed = b;//FIX
flipped = true;
*index = ones[b][0]; //Never overflows, think out why?
}else{
*value_after_change += ((!BIT(v[*index], b)) << b);
}
}else if(flipped){
(*value_after_change) += (v[*index] & (1 << b));
}
}
for(int i = choosed+1; i <= C; ++i){ //FIX
*value_after_change += (v[*index] & (1<<i)); //FIX
} //FIX
return res;
}

void sgdp(bool debug){ //dynamic programming to calculate sprague-grundy numbers
//The following code is unnecessary because sg is global variable.
//for(int i = 0; i <= l-1; ++i){
//    sg[i] = 0; //Nowhere to fetch, have to lose, sg number is 0.
//}
for(int len = l; len <= n; ++len){
set<int> mex;
for(int i = 1; i <= len-l+1; ++i){
//We fetch [i, i+l-1]
int sgleft = sg[i-1]; //sg[0] is defined
int sgright = sg[len-i-l+1];
int sgall = sgleft ^ sgright; //Grundy theorem!!!
f[len][sgall] = i;
mex.insert(sgall);
}
int t = 0;
while(mex.count(t)) ++t;
sg[len] = t;
}
//DEBUG
if(!debug) return;
for(int len = l; len <= n; ++len){
printf("sg[%d]==%d\n", len, sg[len]);
for(int k = 0; k < sg[len]; ++k){
printf("f[%d][%d]==%d\n", len, k, f[len][k]);
}
printf("\n");
}
}

int main(int argc, char** argv){
if(DEBUG){
//freopen(argc == 1?"test1.txt":argv[1], "r", stdin);
}else if(DEBUG_INTERACTIVE){
; //DO NOTHING
}else{
cin.tie(0) -> sync_with_stdio(0);
}
cin >> n >> l >> r;
//The first case: r is large enough:
//Prefix (length <= l-1), Middle (length==r), Appendix (length <= l-1)
int a = 1, b = 1; //Ready to receive moves from the AI.
if(r + 2 * l - 2 >= n && l + r - 1 <= n){
//Killer move
cout << "First" << endl;
cout << l << " " << r << endl;
cin >> a >> b;
//AI fails.
return 0;
}
//The second case: Mimicking
if(l != r || n % 2 == l % 2){
//According to the tutorial, we can use the first move to separate [1-n] into two symmetric parts.
int tmp = r;
while((n - tmp) % 2) --tmp; //Don't be scary, it is O(1).
//Such tmp is always valid. If l != r, one of {r-1, r} is ok. Otherwise if l==r, r is eligible for tmp.
int interval = (n - tmp)/2;
cout << "First" << endl;
cout << interval+1 << " " << tmp << endl;
//Here we cut the interval [1, n] into two parts: [1, interval], [interval+tmp+1, n]
//interval == n-interval-tmp.
while(a || b){
cin >> a >> b;
if(a == 0 && b == 0){
return 0;
}
if(a <= interval){
//cout << a+interval+tmp << " " << b+interval+tmp << endl; //BUG. The second parameter is the LENGTH, not the ENDING POINT.
cout << a+interval+tmp << " " << b << endl;
}else{
//cout << a-interval-tmp << " " << b-interval-tmp << endl; //BUG. The second parameter is the LENGTH, not the ENDING POINT.
cout << a-interval-tmp << " " << b << endl;
}
}
return 0;
}
//The third case: l == r && n % 2 != l % 2. We cannot mimick. No matter where we choose [x, x+y-1], We cannot separate [1, n]
sgdp(DEBUG || DEBUG_INTERACTIVE);
set<pair<int, int>> intervals = {{1, n}};
bool turn = true, initial = true;
if(sg[n] == 0) {
cout << "Second" << endl;
turn = false;
}else{
cout << "First" << endl;
}
while(a || b){
if(turn){
if(!initial){
//Use {a, a+b-1} to separate intervals
pair<int, int> ab = {a, a+b-1};
auto it = intervals.upper_bound(ab);
if(it==intervals.end() || it->first > ab.first) it = prev(it);//Always valid
int fi = it->first, se = it->second;
if(DEBUG || DEBUG_INTERACTIVE) {
for(auto pa: intervals){
cout << "[Intervals1]" << pa.first << " " << pa.second << "\n";
}
printf("DEBUG1: it=={%d, %d}, ab=={%d, %d}\n", fi, se, ab.first, ab.second);
}
if(ab.first-1 >= it->first){
intervals.insert({it->first, ab.first-1});
}
if(ab.second+1 <= it->second){
intervals.insert({ab.second+1, it->second});
}
intervals.erase({fi, se});
}
int isz = (int)intervals.size();
vector<int> v(isz);
vector<pair<int, int>> vp(intervals.begin(), intervals.end()); //This is bad, but convenient...
if(DEBUG || DEBUG_INTERACTIVE) {
for(int i = 0; i < isz; ++i) {
printf("DEBUG_PLACE1 <SG should NOT be 0>: vp[%d]=={%d, %d}, gr==%d\n", i, vp[i].first, vp[i].second, sg[vp[i].second - vp[i].first + 1]);
}
}
for(int i = 0; i < isz; ++i){
v[i] = sg[vp[i].second - vp[i].first + 1];
}
int index = -1, value_after_change = -1;
nim(v, &index, &value_after_change); //I will never make nim(v, &index, &value_after_change)==0 here!
int cutplace = f[vp[index].second - vp[index].first + 1][value_after_change];

if(DEBUG || DEBUG_INTERACTIVE) {
for(int i = 0; i < isz; ++i) printf("v[%d]==%d, index==%d, value_after_change==%d\n", i, v[i], index, value_after_change);
cout << vp[index].first + cutplace - 1 << " " << l << " AI" << endl;
}
else cout << vp[index].first + cutplace - 1 << " " << l << endl;

//Also remember to separate intervals!
pair<int, int> ab = {vp[index].first + cutplace - 1, vp[index].first + cutplace + l - 2};
auto it = intervals.upper_bound(ab); //Always valid
if(it==intervals.end() || it->first > ab.first) it = prev(it);
int fi = it->first, se = it->second;
if(DEBUG || DEBUG_INTERACTIVE) {
for(auto pa: intervals){
cout << "[Intervals2]" << pa.first << " " << pa.second << "\n";
}
printf("DEBUG2: it=={%d, %d}, ab=={%d, %d}\n", fi, se, ab.first, ab.second);
}
if(ab.first-1 >= it->first){
intervals.insert({it->first, ab.first-1});
}
if(ab.second+1 <= it->second){
intervals.insert({ab.second+1, it->second});
}
intervals.erase({fi, se});
if(DEBUG || DEBUG_INTERACTIVE) {
int i = 0;
for(pair<int, int> pa: intervals) {
printf("DEBUG_PLACE2 <SG should be 0>: i==%d, {%d, %d}, sg==%d\n", i, pa.first, pa.second, sg[pa.second-pa.first+1]);
++i;
}
}
}else{
cin >> a >> b;
}
turn  = !turn;
initial = false;
}
if(a == -1 && b == -1) return 0;
}


Acknowledgment: ABalobanov, Nyaan.

• -3

By CristianoPenaldo, history, 17 months ago,

This is a blog for cp newbies (like me).

For a long time I think the Dijkstra algorithm (dij) only has two usages:

(1) Calculate the distance between vertices when the weights of edges are non-negative.

(2) (Minimax) Given a path $p = x_1x_2...x_n$, define $f(p) := \max_{i=1}^{n-1}d(x_i, x_{i+1})$. Given source vertex $s$ and target vertex $t$, dij is able to calculate $min \{f(p)|p\,\text{is a s-t path}\}$.

However, dij works for a function class, not only the sum/max functions. The sum/max functions are only the two most frequently used members of this function class, but the function class is far larger. Once the function $f$ satisfies several mandatory properties, you could use dij. The word "function class" is like an abstract class in C++:

struct f{
virtual bool induction_property() = 0; //bool: satisfy or not
virtual bool extension_property() = 0; //bool: satisfy or not
virtual bool dp_property() = 0; //bool: satisfy or not
};//dijkstra function.
dij(f);


The graph $G=(V(G),E(G))$. WLOG we assume $G$ is connected. There is a non-empty source set $S \in V(G)$ that needs not be a singleton. A path $p$ is a vector of vertices $\{v_1,v_2,...,v_n\}$ and the function $f$ maps paths to real numbers: $\text{path} \rightarrow \mathbb{R}$. To be brief, $f(\{v_1,v_2,...,v_n\})$ is shorten to $f(v_1,v_2,...,v_n)$. We say that dij works for $f$ if $\forall v \in V(G)$, dij correctly computes $d(v) := \min f(\{p \mid p\,\text{is a S-v path}\})$. $f$ should satisfy three properties that are also sufficient:

(1, induction base) $\forall s \in S,\,f(s)$ should be correctly initialized. Pay attention that $f(s)$ needs not to be $0$.

(2, Extension property) $\forall p$, for every vertex $v$ adjacent to the end of $p$ (p.back()), $f(p \cup v) \geq f(p)$.

(3, dynamic programming (dp) property) If path $p, q$ has the same end (i.e., p.back()==q.back()) and $f(p) \geq f(q)$, then for every vertex $v$ adjacent to p.back(), $f(p \cup v) \geq f(q \cup v)$.

Remarks of the dp property:

First, The dp property guarantees that, if $\{v_1, v_2, ..., v_n\}$ is the shortest path from $v_1$ to $v_n$, then for $\forall 1 \leq k \leq n$, $\{v_1, v_2, ..., v_k\}$ is the shortest $v_1$-$v_k$ path. You can do induction from end to front.

Second, if $f(p) = f(q)$, then $f(p \cup v) = f(q \cup v)$. Because $f(p) \geq f(q)$, $f(p \cup v) \geq f(q \cup v)$. Because $f(q) \geq f(p)$, $f(q \cup v) \geq f(p \cup v)$. Therefore $f(p \cup v) = f(q \cup v)$.

The necessity of the induction base is obvious. Otherwise, the $f$(source vertices) are wrong. For the second property, it is well known that the sum version of dij (shortest path) does not work for edges with negative cost (but the max version works!). For the dp property, let's consider the following example:

In this example $f(p):=\sum\limits_{i=1}^{n-1} i \text{dis}(v_i, v_{i+1})$. $f(ABC) = 1+4*2 = 9$; $f(AC)=10$; $f(ABCD) = 1+4*2 + 5*3=24$, $f(ACD)=10+5*2=20$. Since $f(ABC) < f(AC),\,f(ABCD) > f(ACD)$, $f$ violates the dp property. In this case, dij does not work, because dij uses ABC to slack D, giving a wrong result $24$ instead of the correct answer $20$.

We denote $d(v)$ as the real value of the shortest path (i.e., $\min f(\{p \mid p\,\text{is a S-v path}\}$), and $h(v)$ as the value calculated by dij. We want to prove $\forall v \in V(G),\,h(v) = d(v)$. We briefly review the outline of the dij:

define a distance array d. Initially fill d with INF;
Initialize d(s), h(s) for all source vertices s in S;
define a set T which is initially S;
constexpr int calculation_limit = |V(G)|; //You can set the calculation limit to an arbitrary number you want, default |V(G)|
int i = 0; //frame number

while T != V(G) && i < calculation_limit{
choose a vertex u that is not in T with the lowest h value; //Choice step
for every vertice v not in T and adjacent to u, update h(v) as h(v) = min(h(v), f([path from S to u] + v)). //You can record the shortest path from S to v here; //Slack step
T.insert(u);
i++;
}


In the induction step, the $d$ is correctly computed for every source vertex. In the $i$-th frame, a vertex $u$ is chosen in the choice step. We assume that for every $v$ chosen in the choice step of $0,\,1,\,2...\,i-1$ frames, $h(v) = d(v)$. For the $i$-th frame, I am going to show $h(u) = d(u)$. Consider the shortest path from $S$ to $u$ (we call it $S-u$ path, and it must exist since the graph is finite), and denote $y$ as the last vertex that does not belong to $T$:

In the following proof, $S-x$ means a path from S to $x$. It is not set difference.

First, for all source vertices, i.e., $\forall s \in S$, we have $h(s) = d(s)$. That is property 1, induction base.

(1) $h(u) \leq h(y)$. That is irrelevant to these properties, because dij always choose the vertex $u \notin T$ with the lowest $h$ value, if $h(u) > h(y)$, dij will not choose $u$ in this frame.

(2) $d(y) \leq d(u)$. This uses the extension property. $S-u$ path is shortest, so $f(S-u)=d(u)$. Due to the extension property, $f(S-y) \leq f(S-u)$. Since $d(y) = \min \text{all}\,f(S-y)\,\text{paths}$, we get $d(y) \leq d(u)$.

(3) $h(y) \leq f(S-x-y)$. This uses the dp property. By the induction hypothesis, $y$ is slacked using some path $p$ from S to x. Please be careful, $p$ needs not to be the $S-x$ path in the above picture. But according to the first remark of the dp property, $f(S-x) = f(p) = d(x)$. Therefore, by the second remark of the dp property, $y$ is slacked by $f(p-y) = f(S-x-y)$. When calculating $f(p-y)$, the algorithm actually uses $h(x)$ instead of $f(p)$, by the induction hypothesis, $h(x)$ is correctly computed, equals to $f(p)$.

(4) $d(y) = f(S-x-y)$. This uses the dp property (the first remark).

(5) $h(y) = d(y)$. $h(y)$ gives a feasible path from $S$ to $y$, which is greater or equal than the shortest path $d(y)$, that is $h(y) \geq d(y)$. By (3, 4) $h(y) \leq d(y)$. Therefore $h(y) = d(y)$.

(6) $h(u) = d(u)$. $h(u) \leq h(y) = d(y) \leq d(u) \leq h(u)$. Therefore $h(u) = d(u)$.

Now I offer three examples of elegant usage of the function class.

Example 1:

Leetcode2386: Find the K-Sum of an Array. I failed to solve it during the contest. The problem is: You are given an integer array nums and a positive integer k. You can choose any subsequence of the array and sum all of its elements together. We define the K-Sum of the array as the kth largest subsequence sum that can be obtained (not necessarily distinct). Return the K-Sum of the array. For example:

Input: nums = [2,4,-2], k = 5

Output: 2

Explanation: All the possible subsequence sums that we can obtain are the following sorted in decreasing order: 6, 4, 4, 2, 2, 0, 0, -2. The 5-Sum of the array is 2.

n := len(nums) <= 100000; k <= min(2000, 1 << n)

First, we are calculating the "largest", so we need to change $\leq$ to $\geq$ in the three properties. We first add all nonnegative elements in nums. The sum of nonnegative elements is the source vertex that is calculated correctly (property 1). We denote the sum as $S$. For example, if $nums=[2, 4, -2],\,S = 6$. We set every $nums[i]$ to $-nums[i]$ if $nums[i] < 0$. After this statement, all $nums[i] \geq 0$. Then, we sort nums ascendingly. For example, nums: $[2, 4, -2] \rightarrow [2, 4, 2] \rightarrow [2, 2, 4]$. We define the graph $G$ as follows: The vertex $v \in [0, 2^n-1]$ is a binary string of length $n$. If $v[i]=1$, keep $nums[i]$, otherwise, if $v[i]=0$, discard $nums[i]$. For each vertex the out degree is at most $2$: (1)Set $v[i]$ to $0$. (2) If $i \geq 1$, Set $v[i]$ to $0$ and recover $v[i-1]$ to $1$. Below is the search tree:

First, the graph $G$ is a connected tree. It is left to you as an exercise. Feel free to comment if you disagree with it. For a vertex $v$, let $f(v) := S - \sum\limits_{i=0}^{n-1}(1 - v[i]) * nums[i]$ ($f(v)$ is well-defined, there is only one path from root to $v$). You can verify that $f$ satisfies the extension property (That is why we need to sort nums), and due to the nature of add, it satisfies the dp property. Even when the abstracted graph is super large, If the degree of each vertex (which decides how many adjacent vertices we need to explore, in this problem, 2) and the calculation_limit is small enough (In this problem, $\leq 2000$), the dij is feasible even if the abstracted graph is super large, remember to use a priority queue! Below is the code written by 0x3f:

//Original code address: https://leetcode.cn/problems/find-the-k-sum-of-an-array/solution/zhuan-huan-dui-by-endlesscheng-8yiq/
class Solution {
public:
long long kSum(vector<int> &nums, int k) {
long sum = 0L;
for (int &x : nums)
if (x >= 0) sum += x;
else x = -x;
sort(nums.begin(), nums.end());
priority_queue<pair<long, int>> pq;
pq.emplace(sum, 0);
while (--k) {
auto[sum, i] = pq.top();
pq.pop();
if (i < nums.size()) {
pq.emplace(sum - nums[i], i + 1); //Discard nums[i-1]
if (i) pq.emplace(sum - nums[i] + nums[i - 1], i + 1); //Keeps nums[i-1]
}
}
return pq.top().first;
}
};


Example 2:

ABC267E Erasing Vertices 2. Left as practice.

Example 3

ARC150C Path and Subsequence. This problem is super interesting, editorial: https://atcoder.jp/contests/arc150/editorial/4999. There are two sequences $A$ and $B$, and an undirected connected graph $G=(V(G), E(G))$. The problem asks: For every simple path from vertex $1$ to vertex $N$, $p = (v_1, v_2, ..., v_k);\,v_1=1,\;v_k=N$, is $B$ a (not necessarily contiguous) subsequence of $(A_{v_1}, A_{v_2}, ..., A_{v_k})$? All sequences are 1-indexed.

For every simple path $p$ (not necessarily starting from $1$ and ending at $n$), we define:

A(p) := list(map(lambda x: A_x, p))

$f(p) := \max\{j|[B_1, B_2,...,\,B_j] \text{is a subsequence of}\,A(p)\}$.

For example,

$A = [1, 2, 3, 5, 2], B = [1, 3, 2]$. If $p = [1, 2, 5]$, $A(p) = [A[1], A[2], A[5]] = [1, 2, 2]$, $f(p)=1$, because only the first $1$ elements of $B$ ($[1]$) is a subsequence of $A(p)$. If $p = [1, 2, 3, 4, 5]$, $A(p) = [A[1], A[2], A[3], A[4], A[5]] = [1, 2, 3, 5, 2]$ and $f(p) = 3$, because the first $3$ elements of $B$ ($[1, 3, 2]$) is a subsequence of $A(p)$. Does such $f$ satisfy the three properties? For the first property, we need to correctly initialize $f({1}) := [A[1] = B[1]]$. Second, the extension property obviously holds. Third, for the dp property, if $p_1 = p + [u]$:

$f(p_1)= \begin{cases} f(p)+1, A_u = B_{f(p)+1}\\ f(p), A_u \neq B_{f(p)+1} \end{cases}$

Therefore, if $p$ and $q$ are simple paths with same the same ending, and $f(p) > f(q)$, we must have $f(p + [u]) \geq f(q + [u])$. That is because adding one vertex could increase $f(q)$ at most $1$. If $f(p) = f(q)$, Then $f(p + [u]) = f(q + [u])$. That is because, whether $f(p_1)$ ($f(q_1)$) is $f(p)$ or $f(p)+1$ ($f(q)$ or $f(q)+1$) depends only on $A_u$ and $B_{f(p)+1}$, which are the same for $p$ and $q$. Therefore, the dp property holds.

You can solve this problem using dij in $O(|V(G)| + |E(G)|)$.

--- Update: Fix some typos. I do not intentionally occupy the "Recent actions" column.

• +156

By CristianoPenaldo, history, 17 months ago,

Please note that this blog is a review of this problem. It is not novel enough to be an editorial. This problem may not be difficult for those people who are beyond purple, but very difficult for me.

To be more self-contained, I copy the problem here:

Problem Statement:

You are given strings $S$ and $T$ of length $n$ each, consisting of lowercase English letters.

For a string $X$ and an integer $i$, let $f(X,i)$ be the string obtained by performing on $X$ the following operation $i$ times:

Remove the first character of $X$ and append the same character to the end of $X$.

Find the number of integer pairs $(i,j)$ satisfying $0≤i,j≤n−1$ such that $f(S,i) \leq f(T,j)$ lexicographically.

For example, when $n=3$, $S=$"adb", $T=$"cab", there are 4 pairs: (1)$i=j=0$, "adb" $\leq$ "cab"; (2)$i=2, j=0$, "bad" $\leq$ "cab"; (3)$i=0, j=2$, "adb" $\leq$ "bca"; (4)$i=j=2$, "bad" $\leq$ "bca". You can find more test cases here: https://atcoder.jp/contests/abc272/tasks/abc272_f.

Here the problem statement ends. Now we analyze this problem.

The operation $f$ is a cyclic shift (CS). String concatenation is a powerful tool to handle string CS, eg, for $i=2$, $f(S, 2)$= "bad" is a substring of "adbadb". The string concatenation gets rid of mod operation to find index after CS, since mod operation is quite difficult to handle.

Let $S2=S+S$, $T2=T+T$. Here we want to count the pair $(i, j)$ such that $S2.substr(i, n) \leq T2.substr(j, n)$. But this is $O(n^3)$. So here we need another tool: suffix array. Suffix array sorts the suffix of a string. For example, the suffix array of "baad" is $\{1, 2, 0, 3\}$, because the lexicographic order (from small to big) is: ("aad" $1$, "ad" $2$, "baad" $0$, "d" $3$). Please note that the SA-IS algorithm could compute suffix array efficiently enough in O(Length of String). Here is the example code, you may find a C++ implementation of SA-IS in Ac-library:

#include <bits/stdc++.h>
using namespace std;

template<typename T, T...>
struct myinteger_sequence { };

template<typename T, typename S1 = void, typename S2 = void>
struct helper{
std::string operator()(const T& s){
return std::string(s);
}
};

template<typename T>
struct helper<T, decltype((void)std::to_string(std::declval<T>())), typename std::enable_if<!std::is_same<typename std::decay<T>::type, char>::value, void>::type>{
std::string operator()(const T& s){
return std::to_string(s);
}
};

template<typename T>
struct helper<T, void, typename std::enable_if<std::is_same<typename std::decay<T>::type, char>::value, void>::type>{
std::string operator()(const T& s){
return std::string(1, s);
}
};

template<typename T, typename S1 =void, typename S2 =void>
struct seqhelper{
const static bool seq = false;
};

template<typename T>
struct seqhelper<T, decltype((void)(std::declval<T>().begin())), decltype((void)(std::declval<T>().end()))>{
const static bool seq = !(std::is_same<typename std::decay<T>::type, std::string>::value);
};

template<std::size_t N, std::size_t... I>
struct gen_indices : gen_indices<(N - 1), (N - 1), I...> { };
template<std::size_t... I>
struct gen_indices<0, I...> : myinteger_sequence<std::size_t, I...> { };

template<typename T, typename REDUNDANT = void>
struct converter{
template<typename H>
std::string& to_string_impl(std::string& s, H&& h)
{
using std::to_string;
s += converter<H>().convert(std::forward<H>(h));
return s;
}

template<typename H, typename... T1>
std::string& to_string_impl(std::string& s, H&& h, T1&&... t)
{
using std::to_string;
s += converter<H>().convert(std::forward<H>(h)) + ", ";
}

template<typename... T1, std::size_t... I>
std::string mystring(const std::tuple<T1...>& tup, myinteger_sequence<std::size_t, I...>)
{
std::string result;
to_string_impl(result, std::get<I>(tup)...);
return result;
}

template<typename... S>
std::string mystring(const std::tuple<S...>& tup)
{
return mystring(tup, gen_indices<sizeof...(S)>{});
}

template<typename S=T>
std::string convert(const S& x){
return helper<S>()(x);
}

template<typename... S>
std::string convert(const std::tuple<S...>& tup){
std::string res = std::move(mystring(tup));
res = "{" + res + "}";
return res;
}

template<typename S1, typename S2>
std::string convert(const std::pair<S1, S2>& x){
return "{" + converter<S1>().convert(x.first) + ", " + converter<S2>().convert(x.second) + "}";
}
};

template<typename T>
struct converter<T, typename std::enable_if<seqhelper<T>::seq, void>::type>{
template<typename S=T>
std::string convert(const S& x){
int len = 0;
std::string ans = "{";
for(auto it = x.begin(); it != x.end(); ++it){
ans += move(converter<typename S::value_type>().convert(*it)) + ", ";
++len;
}
if(len == 0) return "{[EMPTY]}";
ans.pop_back(), ans.pop_back();
return ans + "}";
}
};

template<typename T>
std::string luangao(const T& x){
return converter<T>().convert(x);
}

namespace atcoder {

namespace internal {

std::vector<int> sa_naive(const std::vector<int>& s) {
int n = int(s.size());
std::vector<int> sa(n);
std::iota(sa.begin(), sa.end(), 0);
std::sort(sa.begin(), sa.end(), [&](int l, int r) {
if (l == r) return false;
while (l < n && r < n) {
if (s[l] != s[r]) return s[l] < s[r];
l++;
r++;
}
return l == n;
});
return sa;
}

std::vector<int> sa_doubling(const std::vector<int>& s) {
int n = int(s.size());
std::vector<int> sa(n), rnk = s, tmp(n);
std::iota(sa.begin(), sa.end(), 0);
for (int k = 1; k < n; k *= 2) {
auto cmp = [&](int x, int y) {
if (rnk[x] != rnk[y]) return rnk[x] < rnk[y];
int rx = x + k < n ? rnk[x + k] : -1;
int ry = y + k < n ? rnk[y + k] : -1;
return rx < ry;
};
std::sort(sa.begin(), sa.end(), cmp);
tmp[sa[0]] = 0;
for (int i = 1; i < n; i++) {
tmp[sa[i]] = tmp[sa[i - 1]] + (cmp(sa[i - 1], sa[i]) ? 1 : 0);
}
std::swap(tmp, rnk);
}
return sa;
}

// SA-IS, linear-time suffix array construction
// Reference:
// G. Nong, S. Zhang, and W. H. Chan,
// Two Efficient Algorithms for Linear Time Suffix Array Construction
template <int THRESHOLD_NAIVE = 10, int THRESHOLD_DOUBLING = 40>
std::vector<int> sa_is(const std::vector<int>& s, int upper) {
int n = int(s.size());
if (n == 0) return {};
if (n == 1) return {0};
if (n == 2) {
if (s[0] < s[1]) {
return {0, 1};
} else {
return {1, 0};
}
}
if (n < THRESHOLD_NAIVE) {
return sa_naive(s);
}
if (n < THRESHOLD_DOUBLING) {
return sa_doubling(s);
}

std::vector<int> sa(n);
std::vector<bool> ls(n);
for (int i = n - 2; i >= 0; i--) {
ls[i] = (s[i] == s[i + 1]) ? ls[i + 1] : (s[i] < s[i + 1]);
}
std::vector<int> sum_l(upper + 1), sum_s(upper + 1);
for (int i = 0; i < n; i++) {
if (!ls[i]) {
sum_s[s[i]]++;
} else {
sum_l[s[i] + 1]++;
}
}
for (int i = 0; i <= upper; i++) {
sum_s[i] += sum_l[i];
if (i < upper) sum_l[i + 1] += sum_s[i];
}

auto induce = [&](const std::vector<int>& lms) {
std::fill(sa.begin(), sa.end(), -1);
std::vector<int> buf(upper + 1);
std::copy(sum_s.begin(), sum_s.end(), buf.begin());
for (auto d : lms) {
if (d == n) continue;
sa[buf[s[d]]++] = d;
}
std::copy(sum_l.begin(), sum_l.end(), buf.begin());
sa[buf[s[n - 1]]++] = n - 1;
for (int i = 0; i < n; i++) {
int v = sa[i];
if (v >= 1 && !ls[v - 1]) {
sa[buf[s[v - 1]]++] = v - 1;
}
}
std::copy(sum_l.begin(), sum_l.end(), buf.begin());
for (int i = n - 1; i >= 0; i--) {
int v = sa[i];
if (v >= 1 && ls[v - 1]) {
sa[--buf[s[v - 1] + 1]] = v - 1;
}
}
};

std::vector<int> lms_map(n + 1, -1);
int m = 0;
for (int i = 1; i < n; i++) {
if (!ls[i - 1] && ls[i]) {
lms_map[i] = m++;
}
}
std::vector<int> lms;
lms.reserve(m);
for (int i = 1; i < n; i++) {
if (!ls[i - 1] && ls[i]) {
lms.push_back(i);
}
}

induce(lms);

if (m) {
std::vector<int> sorted_lms;
sorted_lms.reserve(m);
for (int v : sa) {
if (lms_map[v] != -1) sorted_lms.push_back(v);
}
std::vector<int> rec_s(m);
int rec_upper = 0;
rec_s[lms_map[sorted_lms[0]]] = 0;
for (int i = 1; i < m; i++) {
int l = sorted_lms[i - 1], r = sorted_lms[i];
int end_l = (lms_map[l] + 1 < m) ? lms[lms_map[l] + 1] : n;
int end_r = (lms_map[r] + 1 < m) ? lms[lms_map[r] + 1] : n;
bool same = true;
if (end_l - l != end_r - r) {
same = false;
} else {
while (l < end_l) {
if (s[l] != s[r]) {
break;
}
l++;
r++;
}
if (l == n || s[l] != s[r]) same = false;
}
if (!same) rec_upper++;
rec_s[lms_map[sorted_lms[i]]] = rec_upper;
}

auto rec_sa =
sa_is<THRESHOLD_NAIVE, THRESHOLD_DOUBLING>(rec_s, rec_upper);

for (int i = 0; i < m; i++) {
sorted_lms[i] = lms[rec_sa[i]];
}
induce(sorted_lms);
}
return sa;
}

}  // namespace internal

std::vector<int> suffix_array(const std::vector<int>& s, int upper) {
assert(0 <= upper);
for (int d : s) {
assert(0 <= d && d <= upper);
}
auto sa = internal::sa_is(s, upper);
return sa;
}

template <class T> std::vector<int> suffix_array(const std::vector<T>& s) {
int n = int(s.size());
std::vector<int> idx(n);
iota(idx.begin(), idx.end(), 0);
sort(idx.begin(), idx.end(), [&](int l, int r) { return s[l] < s[r]; });
std::vector<int> s2(n);
int now = 0;
for (int i = 0; i < n; i++) {
if (i && s[idx[i - 1]] != s[idx[i]]) now++;
s2[idx[i]] = now;
}
return internal::sa_is(s2, now);
}

std::vector<int> suffix_array(const std::string& s) {
int n = int(s.size());
std::vector<int> s2(n);
for (int i = 0; i < n; i++) {
s2[i] = s[i];
}
return internal::sa_is(s2, 255);
}

}  // namespace atcoder

int main(){
cout << luangao(v) << "\n"; //{1, 2, 0, 3}
}


Let $Sf(S, i)$ be the suffix of S from index $i$ ($0$-indexed). For example, $Sf$("abcd", 1) = "bcd". The key idea is making a transform $g$ of $S, T$, together with their indices, such that $S2.substr(i, n) \leq T2.substr(j, n) \leftrightarrow Sf(g(S2, T2), g(i)) < Sf(g(S2, T2), g(j))$, where $g(S2, T2)$ is a large string obtained from $S2, T2$, and $g(i), g(j)$ are the transformed indices (because $i, j$ correspond to other indices different from $i, j$ in the large string $g(S2, T2)$). We need to transform the comparison of substrings to the comparison of suffices, by doing so we can utilize the powerful tool: SA-IS. Please note that there are never two equal suffices of a string because of different lengths.

One failed construction is $g(S2, T2) := T2 + S2$. For example, $T=$"zabf", $T2=$"zabfzabf", $S=$"abfz", $S2=$"abfzabfz", T2+S2="zabfzabfabfzabfz". One can check that, the suffix corresponding to the first "abf" ("abfzabfabfzabfz") < the suffix corresponding to the second "abf" ("abfzabfz").

Let's review the reason for the failure of our aforementioned construction. First, $Sf(T2+S2, i+2n) < Sf(T2+S2, j) \rightarrow S2.substr(i, n) \leq T2.substr(j, n)$ is obviously correct. If $S2.substr(i, n) < T2.substr(j, n)$, then $Sf(T2+S2, i+2n) < Sf(T2+S2, j)$. The problem occurs when $S2.substr(i, n) = T2.substr(j, n)$. We can imagine there are two pointers $p, q$ whose initial places are $i+2n, j$, respectively. Pointer $p, q$ go simultaneously to the end of the string $T2+S2$, each step they travel one character. If $j > i$, then $q$ arrives index $2n-1$ earlier than $p$ arrives index $4n-1$, therefore $q$ would go into the $S$ area and lose control.

Both $S2$ and $T2$ are circular. If $S2.substr(i, n) = T2.substr(j, n)$, then $S2.substr(i, 2n-max(i, j)) = T2.substr(j, 2n-max(i, j))$. For example $T2=$"zabfzabf", $S2=$"abfzabfz", $i=1, j=0, n=4$, then $2n-max(i, j)=7$, "abfzabf"=="abfzabf". When the pointers $p, q$ are in the area of $S2, T2$, respectively, they are "safe". The key problem is pointer $q$ might wander out $T2$. So we need to protect $q$ when $q$ is out $T2$. A key idea is to add protection. We construct $g(T2, S2)$ as $T2$ + string(n-1, 'z') + $S2$. The string(n-1, 'z') is a safe zone. Since $0 \leq i,j \leq n-1$, When $p$ reaches the end of the string, it is guaranteed that $q$ is in $T2$ or the safe zone string(n-1, 'z') and $SF(g(T2, S2), j) > SF(g(T2, S2), i + 3n-1)$. Reason of $3n-1$: The length of $T2$ is $2n$ and the length of the safe zone is $n-1$. $2n + n-1 = 3n-1$.

Wrong answer (during contest) without the "safe zone": WA.

Core code:

int main(){
int n;
string s, t;
cin >> n >> s >> t;
string large = t + t + string(n-1, 'z') + s + s;
vector<int> v = move(atcoder::suffix_array(large)); //O(n)
ll ans = 0, snum = 0;
//You can list S-T suffice like S S T S S T T...
//When you meet T, add the number of S before that T to answer.
for(int i = 0; i < 5*n-1; ++i){
int x = v[i];
if(x >= 3*n-1 && x <= 4*n-2){
++snum;
}
if(x >= 0 && x <= n-1){
ans += snum;
}
}
cout << ans << "\n";
}

• +42