### box's blog

By box, history, 2 years ago,

Hello! This is the editorial (and discussion post i guess) for box 2021, a collection of problems that I set and received positive feedback on and generally just problems that I wanted to share with the broader CP community. The previous installment of this series is box 2020.

The contest is IOI-style with subtasks; the problems are roughly sorted by difficulty but there is no difficulty gradient.

Hint
Solution

Hint
Solution

Hint
Solution

# 103500D - multiverse FINALE

i think this is the best problem out of these five — credit to radoslav11 for the solution idea :D

Hint
Solution

# 103500E - Factors

Partial solution
Tutorial of box 2021
• +95

By box, 3 years ago,

Consider the following problem (general form of original Min25 sieve):

We are given some multiplicative function $f(n)$, described at prime powers by $f(p^k)=g(p,k)$, where $g(p,1)$ is a polynomial of low degree, and we can calculate $g(p,k)$ for any singular $p$ and $k$ in $O(1)$.

How do we calculate the prefix sum of $f(n)$ at the distinct positive integer values of $\lfloor\frac Nk\rfloor$, where $N\le 10^{10}$? Clearly, brute force or regular sieves are not going to cut it here.

Prerequisite: For any integer $N$, for all integers $1\le k\le N$ there are at most $2\sqrt N$ distinct values of $\lfloor\frac Nk\rfloor$. Call these $2\sqrt N$ such values the blocks of $N$. (A note on notation: In the following text, $\frac Nk$ will be used to describe regular division, and $N/k$ will be used to denote floored division.)

Proof

Our workflow is going to be similar to this:

1. We iterate over each nonzero exponent $T$ of $g(n,1)$. We first use the prefix sum of $x^T$ to estimate the contribution of $x^T$ to each prefix sum.
2. Then, we sieve away composite values to ensure that our new prefix sum only counts prime $n$.
3. After summing the result of the step 3 sieve over all $T$, we have obtained the prefix sum of $f(n)[n\in\mathbb P]$, where $\mathbb P$ is the set of primes.
4. Then, we sieve back composite values to restore a correct answer.

The first step is done trivially by an interpolation of your choice.

The second step asks us to evaluate the following function at the blocks of $N$:

$S(n)=\sum\limits_{i=1}^n[i\in\mathbb P]i^T$

Let $p_1,p_2,\dots$ be the sequence of primes, $p_0=1$, and $\mathbb P(i)$ be the minimum prime factor of $i$.

Consider the following, modified, version of $S$, which "simulates" sieving away primes:

$S(n,j)=\sum\limits_{i=1}^n[i\in\mathbb P\lor\mathbb P(i)>p_j]i^T$

As $j$ tends to infinity, each $n$ satisfies $S(n)=S(n,j)$. In particular, when $p_j^2>n$, we have $S(n)=S(n,j)$, because if $x$ is composite its minimum prime factor must not be greater than $\sqrt x$! This means that we don't need to check all primes up to $N$, but instead only up to $\sqrt N$. Now, we have the edge case:

$S(n,0)=\sum\limits_{i=2}^ni^T$

The problem becomes: how do we convert $S(n,j-1)$ evaluated at the blocks of $N$ to $S(n,j)$ evaluated at the blocks of $N$? We effectively need to sieve away the composites with a minimum prime factor of exactly $p_j$.

In fact, the integers with a minimum prime factor of exactly $p_j$ are part of $p_j^TS(n/p_j,j-1)$, because this fixes at least one power of $p_j$, but unfortunately this includes primes less than $p_j$. So we simply subtract away them, and get what we need to sieve away: $p_j^T(S(n/p_j,j-1)-S(p_{j-1},j-1))$.

This gives us an recurrence:

$S(n,j)=S(n,j-1)-p_j^T(S(n/p_j,j-1)-S(p_{j-1},j-1))$

Note that $S(p_{j-1},j-1)$ is simply the sum of the $T$-th powers of the first $j-1$ primes, so this can be precalculated. We replace this with $W_{j-1}$:

$S(n,j)=S(n,j-1)-p_j^T(S(n/p_j,j-1)-W_{j-1})$

It is now apparent that the values of $S(n,j)$ at blocks of $N$ only depend on values of $S(n,j-1)$ at blocks of $N$; hence we maintain $S(n,j)$ at blocks of $N$ as we slowly increase $j$ until $p_j^2>N$.

What is the time complexity of this?

Observe that for some $j$, we only need to update values of $n$ where $p_j^2<n$, because at other positions, $S(n,j-1)$ is already a correct estimate of $S(n)$. (Without this optimization, the time complexity becomes $O(N/\log N)$.)

This means that the time complexity of this part is proportional to the sum of the number of primes less than $n$ for each $n$ in the blocks of $N$. Here it suffices to consider only the $n>\sqrt N$, as brute force works with the other half.

We have:

$\sum\limits_{k=1}^{\sqrt N}\pi(\sqrt{N/k})=\sum\limits_{k=1}^{\sqrt N}O(\frac{\sqrt{\frac Nk}}{\log\frac Nk})$
$=O(\frac{\int_{1}^{\sqrt N}\sqrt{\frac Nk}dk}{\log N})=O(\frac{N^{3/4}}{\log N})$

Now, we move on to the fourth step.

To sieve composite values back, we consider "reversing" the process we did in the second step. This suggests that we still use a function sieving based on the minimum prime factor. Define

$R(n,j)=\sum\limits_{i=2}^n[i\in\mathbb P\lor\mathbb P(i)\ge p_j]f(n)$

Our base case is now

$R(n,\pi(\sqrt n)+1)=S(n)$

Our problem becomes to transition from $j+1$ to $j$; that is, to sieve back in integers with a minimum prime factor of $p_j$.

The composite number in question, $x$, could fall into two categories.

1. $x$ may have prime factors that are not $p_j$; in other words, $x=p_j^cy$ for some $1\le c$ and $p_j<\mathbb P(y)$.
2. $x$ is a non-prime power of $p_j$; in other words, $x=p_j^c$ for some $1<c$. Note that, if $c=1$, then $x$ is not composite.

We count the contribution of all $c$ where $p_j^c\le n$. For the second type, we may count by evaluating $g(p_j,c)$ with brute force, so we care about deriving the value of the first type.

We must select $y$ such that $1\le y\le n/p_j^c$ and $\mathbb P(y)>p_j$. Like our calculation for the second step, this quantity is $R(n/p_j^c,j+1)-\text{included primes}=R(n/p_j^c,j+1)-S(p_j)$. If we aren't careful in our enumeration of $c$, it might be that $p_j>n/p_j^c$, so we take $\max(0,R(n/p_j^c)-S(p_j))$.

Remember that $f$ is multiplicative, so to account for the power $p_j^c$, we just multiply this by $g(p_j,c)$.

To summarize what we are sieving in:

$\sum\limits_{c=1,p_m^c\le n}g(p_j,c)\max(0,R(n/p_j^c)-S(p_j))+\sum\limits_{c=2,p_m^c\le n}g(p_j,c)$

Observe that for the first summation, $c$ contributes if and only if $p_m^{c+1}\le n$, so we can rewrite the summation and remove the maximum operand — to get a full recurrence:

$R(n,j)=R(n,j+1)+\sum\limits_{c=1,p_m^{c+1}\le n}g(p_j,c)(R(n/p_j^c)-S(p_j))+g(p_j,c+1)$

Iterating $j$ from $\pi(\sqrt n)+1$ to $0$, our answers are at $R(n,0)$.

The analysis of the time complexity of this part is practically the same as that of the first part.

## Example: SPOJ ASSIEVE

(Yes, in fact this entire blog post was to advertise this problem of mine.)

First let's weaken the problem a bit: How do we calculate

$g(n,k,r)=\sum\limits_{i=1}^ni^k[L(i)\equiv r\pmod 4]$

where $L(i)$ is the integer log function, equal to the sum of the prime divisors of $i$, counting repetition?

At first it is not obvious at all how this is a multiplicative function. However, the Min25 sieve is not bound to functions in the integers. It works perfectly fine for polynomials too!

Consider the multiplicative function $\mathfrak L(i)$, defined by $\mathfrak L(p^c)=p^{ck}x^{(pc)\bmod 4}$, operating on $\mathbb Z/p\mathbb Z[x]/(x^4-1)$. This happens to be the cyclical convolution ring of polynomials of degree $4$ — in other words, convolution here is defined so that $x^{i+j}$ overflows to $x^{(i+j)\bmod 4}$. This is exactly what we need — addition modulo $4$!

Unfortunately, this function does not seem to satisfy an important requirement of the sieve: $p^kx^{p\bmod 4}$ is not a polynomial, nor a fully multiplicative function. A first instinct would be to do FFT/NTT — but this does not help either.

The solution would be to completely circumvent the first step. Instead, we take the true meaning of the function $p^kx^{p\mod 4}$. To find the prefix sum of this at $n$, what we really want is for each $z=0,1,2,3$, the sum of the $k$-th powers of all primes less than $n$, where said prime modulo $4$ is equivalent to $z$.

Surprisingly, this is doable.

To formalize, we need, at the blocks of $N$:

$z=0,1,2,3:\sum\limits_{i=1}^{n}[i\in\mathbb P\land i\equiv z\pmod 4]i^k$

Observe that for $z=0$ the answer is always $0$, and for $z=2$ the answer is $[2\le n]2^k$. Consider how to sieve away the answers for $z=1$ and $z=3$ in parallel.

Let us recall the recurrence for the second step of the sieve:

$S(n,j)=S(n,j-1)-p_j^T(S(n/p_j,j-1)-W_{j-1})$

In order to incorporate $z$, we need to augment this into a polynomial, with nonzero coefficients of $x^1$ and $x^3$. Let the current prime in question be $p=p_j$, and let the previous prime be $p'=p_{j-1}$. We want

$[x^{z|z\in\{1,3\}}]S(n,j)=\sum\limits_{i=1}^n[i\in\mathbb P\lor\mathbb P(i)>p_j][i\equiv z\pmod 4]i^k$

Remember that from $p'$ to $p$, we sieve away the composites with a minimum prime factor of $p$, so these composites can all be expressed in the form $pd$. Now, we can do casework on $p$ modulo $4$. In the following, $p,pd,d$ are all expressed modulo $4$.

1. If $p=1$, then:
1. For $pd=1$, we must have $d=1$. Hence we seive away $p^k[x^1](S(n/p_j,j-1)-W_{j-1})$ from $[x^1]S(n,j)$.
2. For $pd=3$, we must have $d=3$. Hence we seive away $p^k[x^3](S(n/p_j,j-1)-W_{j-1})$ from $[x^3]S(n,j)$.
2. If $p=3$, then:
1. For $pd=1$, we must have $d=3$. Hence we seive away $p^k[x^3](S(n/p_j,j-1)-W_{j-1})$ from $[x^1]S(n,j)$.
2. For $pd=3$, we must have $d=1$. Hence we seive away $p^k[x^1](S(n/p_j,j-1)-W_{j-1})$ from $[x^3]S(n,j)$.

To calculate $S(n,0)$, use any polynomial interpolation algorithm.

This recurrence now allows us to successfully complete the second step of sieving. The fourth step is virtually the same, except we operate in $\mathbb Z/p\mathbb Z[x]/(x^4-1)$.

Notice that the given function at primes has not changed in the original problem. We only need to modify the multiplication operation in the fourth step to complete our solution.

I hope you enjoyed this blog! If you have any interesting problems that can be solved with this sieve, let me know.

Have a nice day :)

• +161

By box, history, 3 years ago,

Consider the following constructive problem: Given prime $p$ and integer $k$, we need to construct operations $\oplus$ and $\otimes$ over the integers $0$ through $p^k-1$, such that for any $0\le x,y,z<p^k$, we have

1. $0\oplus x=x\oplus 0=x$ (additive identity)
2. $\exists r:a\oplus r=r\oplus a=0$ (additive inverse)
3. $1\otimes x=x\otimes 1=x$ (multiplicative identity)
4. $\exists r:a\otimes r=r\otimes a=1$ (multiplicative inverse)
5. $x\oplus y=y\oplus x$ (commutativity of addition)
6. $x\otimes y=y\otimes x$ (commutativity of multiplication)
7. $(x\oplus y)\oplus z=x\oplus(y\oplus z)$ (associativity of addition)
8. $(x\otimes y)\otimes z=x\otimes(y\otimes z)$ (associativity of multiplication)
9. $(x\oplus y)\otimes z=x\otimes z+y\otimes z$ (distributivity)

In other words, construct a field of size $p^k$.

The field $\mathbb Z_p$ are the integers modulo some prime $p$. How can we use this to construct a field of size $p^k$, where $k$ is some integer greater than $2$?

Consider some polynomial of degree $k$, $F(x)=a_0+a_1x+a_2x^2+\dots+a_{k-1}x^{k-1}+x^k$, such that $F(x)$ is irreducible modulo $p$, or that it has no nontrivial factors. What would happen if we introduce a new element $I$, that is defined to be a root of $F(x)$, much like how mathematicians introduced $i$ as a root of $x^2+1$?

By the definition of $I$, we must have that

$I^k+a_{k-1}I^{k-1}+\dots+a_2I^2+a_1I+a_0=0$

or equivalently

$I^k=-a_{k-1}I^{k-1}-\dots-a_2I^2-a_1I-a_0$

Hence, we can create a new field where elements are of the form $v_0+v_1I+v_2I^2+\dots+v_{k-1}I^{k-1}$, where $0\le v_i<p$. Addition is done term-wise, and multiplication is equivalent to convolution. Note that if after a multiplication there $I^k,I^{k+1},\dots,I^{2k-2}$ terms are nonzero, we can reduce them using $I^k=(P-a_{k-1})I^{k-1}+\dots+(P-a_2)I^2+(P-a_1)I+(P-a_0)$.

The number of elements in this field is obviously $p^k$, but how do we verify that it indeed is a field, and that for reducible $F(x)$ the resulting elements do not form a field?

Observe, that we can treat all elements of this field as polynomials in the ring $Z_p[x]$ with highest nonzero term of degree at most $k-1$ during operations, keeping in mind to reduce after a multiplication. But we can do better — during multiplication, because $I^k$ becomes $(P-a_{k-1})I^{k-1}+\dots+(P-a_2)I^2+(P-a_1)I+(P-a_0)$, the polynomials are taken modulo $x^k-\left((P-a_{k-1})I^{k-1}+\dots+(P-a_2)I^2+(P-a_1)I+(P-a_0)\right)$, or equivalently, $F(x)$! Now all that's left is to prove that $Z_p[x]/F(x)$, or all elements of $Z_p[x]$ modulo $F(x)$, forms a field.

Associativity, commutativity, additive identities ($0$), multiplicative identities ($1$), additive inverses, and distributivity are all readily obvious. What's left is to prove that every nonzero element of $Z_p[x]/F(x)$ has an inverse. In $A(x)B(x)\equiv 1\pmod{F(x)}$, $B(x)$ exists if and only if $A(x)$ is coprime to $F(x)$, using an extension of the extended Euclidean algorithm to polynomials. Because $F(x)$ is irreducible, it has no factors other than $1$ less than itself, so every nonzero element of $Z_p[x]/F(x)$ is coprime to $F(x)$ — note that this doesn't hold when $F(x)$ isn't irreducible!

Hence, we have proven that $Z_p[x]/F(x)$ is a field, and equivalently, the field we have constructed out of elements of the form $v_0+v_1I+v_2I^2+\dots+v_{k-1}I^{k-1}$ is a field.

I saw this problem on multiple sources, one of which is chapter 10 of Introductory Combinatorics, but I felt like I made some semi-interesting insights that I was proud of and wanted to share it with Codeforces :)

Have a nice day!

• +147

By box, history, 3 years ago,

My last blog was a bit too esoteric

This blog assumes that you can do a basic convolution (e.g. FFT). After I saw that many people did not know how to do Chirp Z-Transform with arbitrary exponents fast from my blog 9 weeks ago I decided to shitpost write a short tutorial on it.

Let's say we have a polynomial $P(x)$, such that

\begin{align}P(x)=\sum_{i=0}^{n-1}a_ix^i\end{align}

Now, let's say we have $c$ and $m$, and we want to find $P(c^0),P(c^1),P(c^2),\dots,P(c^m)$. Let $b_j=P(c^j)$. Consider the implications of having $x$ as such a form:

\begin{align}b_j=\sum_{i=0}^{n-1}a_ic^{ij}\end{align}

Tinkering around with different expressions for $ij$, one finds that $ij=\frac{(i+j)^2}{2}-\frac{i^2}{2}-\frac{j^2}{2}$. This means that

\begin{align}b_j=\sum_{i=0}^{n-1}a_ic^{\frac{(i+j)^2}{2}-\frac{i^2}{2}-\frac{j^2}{2}}\end{align}
\begin{align}b_jc^{\frac{j^2}{2}}=\sum_{i=0}^{n-1}(a_ic^{-\frac{i^2}{2}})c^{\frac{(i+j)^2}{2}}\end{align}

Hence we can find $b_j$ from the difference-convolution of $a_ic^{-\frac{i^2}{2}}$ and $c^{\frac{i^2}{2}}$. However, in many cases — especially when working under a modulus — we can't find the $c^{\frac{i^2}{2}}$ as $i$ may be odd. We use a workaround: $ij=\binom{i+j}{2}-\binom{i}{2}-\binom{j}{2}$. Proof:

$ij=\binom{i+j}{2}-\binom{i}{2}-\binom{j}{2}$
$ij=\frac{(i+j)(i+j-1)}{2}-\frac{(i)(i-1)}{2}-\frac{(j)(j-1)}{2}$
$2ij=(i+j)(i+j-1)-(i)(i-1)-(j)(j-1)$
$2ij=(i)(i+j-1)+(j)(i+j-1)-(i)(i-1)-(j)(j-1)$
$2ij=(i)((i+j-1)-(i-1))+(j)((i+j-1)-(j-1))$
$2ij=(i)(j)+(j)(i)$

Modifying our formula a bit, we get

\begin{align}b_jc^{\binom j2}=\sum_{i=0}^{n-1}(a_ic^{-\binom i2})c^{\binom{i+j}2}\end{align}

As for implmentation details, notice that

\begin{align}b_jc^{\binom j2}=\sum_{i=0}^{n-1}(a_{(n-(n-i))}c^{-\binom{n-(n-i)}2})c^{\binom{i+j}2}\end{align}

Define $C_i=a_{n-i}c^{-\binom{n-i}2}$; $D_i=c^{\binom i2}$. We hence have

$b_jx^{\binom j2}=\sum_{i=0}^{n-1}C_{n-i}D_{i+j}$
$b_j=x^{-\binom j2}(C*D)_{n+j}$

(the second through definition of convolution)

You can test your implementations here, mod 998244353 and here, mod 10^9+7, although note that the second one is both intense in precision and constant factor.

My code for the former

This method can be used to cheese 1184A3 - Heidi Learns Hashing (Hard) and 1054H - Epic Convolution, and is also a core point in 901E - Cyclic Cipher.

• +146

By box, history, 3 years ago,

# A brief introduction into the applications of square root decomposition

Square root decomposition is the process of separating a structure of size $O(N)$ into $O(\sqrt{N})$ "blocks" of size $O(\sqrt{N})$ each, in a way that aids manipulation of the entire structure. Square root decomposition is extremely versatile. Some of its most well-known use cases are:

• answering queries on a static array, with methods such as Mo's algorithm or block precomputation
• "lazy" brute force modifications, where it is easy to query an entire block, but tag pushing isn't obvious
• separating objects based on a threshold, when there exists both an $O(x)$ algorithm and an $O(n/x)$ algorithm

In this tutorial we will walk through multiple types of square root decomposition.

## Answering queries on a static array, offline (Mo's algorithm)

Consider a problem where we are asked to find the answer for certain intervals $[l,r]$. We can't quickly compute the answer for an arbitrary interval, but we know how to transition to $[l,r\pm1]$ and $[l\pm1,r]$ fast given some information remaining from $[l,r]$ answer calculation. The number of transitions we need to do to get from $[l_1,r_1]$ to $[l_2,r_2]$ is $|l_1-l_2|+|r_1-r_2|$.

If there are only two intervals we need to answer such transitioning wouldn't help us. However, if there are many intervals, choosing a good transitioning route will drastically reduce the total time needed. Finding the best transition route quick is allegedly NP-hard, so we will focus on estimating a "good enough" route.

Consider the following scheme for 2D Manhattan TSP: We do square root decomposition on the $x$ coordinate, separating the grid into vertical blocks, $n$ cells tall and $B$ cells wide. There are $n/B$ such blocks.

For an arbitrary block, call the number of points we need to visit in it $C$. To get to all of them, we can visit them in order from top to bottom: we spend at most $O(BC)$ transitions moving left and right, and $O(n)$ transitions going down.

If for each block we need $O(BC+n)$ transitions, in total we will need $O(\sum(BC+n))$ = $O(Bq+n^2/B)$ transitions. Selecting a block size of $B=O(n/\sqrt{q})$ gives us a total transition count of $O(qn/\sqrt{q}+n\sqrt{q})$ = $O(n\sqrt{q})$.

Implementation of sorting these transitions can be done as follows:

int B = ???;
struct query {
int l, r, id;
const bool operator<(const query& o) const {
if(l/B == o.l/B) {
// they are in the same block, sort going down
return r < o.r;
} else {
// they are not in the same block, sort by block number
return l < o.l;
}
}
}


Observe here that between each block, we need to run from the bottom of the "grid" all the way up. We can prevent this by using a zig-zag pattern, getting a bit of a constant optimization:

int B = ???;
struct query {
int l, r, id;
const bool operator<(const query& o) const {
if(l/B == o.l/B) {
if((l/B) % 2 == 0) return r < o.r;
else return r > o.r;
} else {
return l < o.l;
}
}
}


In the following we will look at some examples.

### Range distinct query (SPOJ DQUERY)

Statement: Given a static array and queries of the form $[l,r]$, count the number of different values that occur between $[l,r]$.

The naive transition is intuitive: we maintain a table $v$ where $v[x]$ is the number of times $x$ occured in the interval that we are maintaining. When adding an element $x$ such that $v[x]=0$, increment the answer; when deleting an element $x$ such that $v[x]=1$, decrement the answer.

However, this needs many random accesses to this table, which is very cache-unfriendly, leading to a massive constant. Can we do better?

Consider arrays $pre[i]$ and $nxt[i]$, which equal the last location of the element $i$ and the next location of the element $i$, respectively (equal to some infinity if this element doesn't exist.) We can use these arrays to quickly check whether or not some element occurs in an range.

• Case 1: $[l,r]\rightarrow[l,r+1]$. We add 1 if and only if $r+1$ has not occured in $[l,r]$, or equivalently, $pre[r+1] < l$.
• Case 2: $[l,r]\rightarrow[l,r-1]$. We subtract 1 iff $r$ has not occured in $[l,r-1]$.
• Case 3: $[l,r]\rightarrow[l-1,r]$. We add 1 iff $l-1$ has not occurned in $[l,r]$, or equivalently, $r < nxt[l-1]$.
• Case 4: $[l,r]\rightarrow[l+1,r]$. We subtract 1 iff $l$ has not occured in $[l+1,r]$.

This leads to a much quicker solution, which runs plausibly fast even for $n=10^6$ even if we directly implement Mo's.

### Range inversion query (naive version)

Statement: Given a static array and queries of the form $[l,r]$, count the number of pairs $(i,j)$ such that $a[i]>a[j]$. $O(n\log n\sqrt n)$.

Every time we add or erase an element, we consider how much this element contributes to the answer. If we maintain a Fenwick tree corresponding to the current interval, we can quickly find out how many inversions involve a certain value. Transition is $O(\log n)$; hence overall it is $O(n\log n\sqrt n)$.

### Advanced: Sweepline Mo / Offline Again

This is 617E - XOR and Favorite Number, except there are multiple favorite numbers.

Statement: Given a static array, a static set $S$ of size $C$ and queries of the form $[l,r]$, count the number of pairs $(i,j)$ such that $a[i]\text{ xor }a[j]\in S$. $O(nC+n\sqrt n)$ time; $O(n)$ memory. Assume $q=O(n)$.

Notice that $a[i]\text{ xor }a[j]\in S$ is equivalent to $\exists v\in S:a[i]\text{ xor }v=a[j]$. We can consider each $v$ separately and run $C$ instances of Mo's, but that would be $O(nC\sqrt n)$. Consider the transitions: we add an element $j$ and want to know the amount of $i$ such that $a[i]\text{ xor }a[j]\in S$ and $i\in[l,r]$.

Call the transition delta $f(l,r,p)$ where $p$ is the element we are adding. Removing can be done by subtracting $f(l,r-1,p)$ or $f(l+1,r,p)$ depending on the direction. Sweepline Mo can be used when this transition function is differentiable: i.e. it satisfies $f(l,r,p)=f(1,r,p)-f(1,l-1,p)$.

The main idea is to first run a "dry" Mo and find all $f(1,x,p)$ that we need to calculate. Then, we do a sweep-line on $x$, updating the data structure accordindly, and calculating the $f(1,x,p)$ that involve this $x$. We generally needs the "calculation" part to be $O(1)$, like normal Mo, but we make more wiggle room for the update part.

However, if we store all $(x,p)$, we get $O(n\sqrt n)$ memory and a ridiculous constant factor. Notice that because of the Mo transition order, we can separate all $p$ into several types, given a fixed $x$:

1. $p=x$
2. $p=x+1$
3. $p\in[A,B]$

For the first and second type, we calculate $ansPrev[i]$ and $ansNext[i]$ and don't update anything, leaving these arrays for the second "wet" Mo to use.

For the third type, we find all $(p,A,B)$ that result from the dry Mo. There are at most $4(M-1)$ such intervals because the movement of any end of the Mo interval creates exactly one interval. Then, on the corresponding $x$, we iterate through the interval and add the transition value we found to the respective query.

Example implementation:

    sort(qs, qs + m);
cl = 1, cr = 0;
for(int i = 0; i < m; i++) {
if (cr < qs[i].r) {
iv[cl - 1].push_back({cr + 1, qs[i].r, i, -1});
cr = qs[i].r;
}
if (qs[i].l < cl) {
iv[cr].push_back({qs[i].l, cl - 1, i, 1});
cl = qs[i].l;
}
if (qs[i].r < cr) {
iv[cl - 1].push_back({qs[i].r + 1, cr, i, 1});
cr = qs[i].r;
}
if (cl<qs[i].l) {
iv[cr].push_back({cl, qs[i].l - 1, i, -1});
cl = qs[i].l;
}
}
for(int i = 1; i <= n; i++) {
ansPrev[i] = ansPrev[i-1] + cgc[ar[i]];
add(ar[i]);
for (auto [l, r, i, c] : iv[i]) {
ll g = 0;
for(int p = l; p <= r; p++)
g += cgc[ar[p]];
ans[i] += c * g;
}
ansThere[i] = ansThere[i - 1] + cgc[ar[i]];
}
for(int i = 0; i < m; i++) {
curans += ans[i];
fans[qs[i].i] = curans + ansPrev[qs[i].r] + ansThere[qs[i].l-1];
}
for(int i = 0; i < m; i++) {
print(fans[i]);
pc('\n');
}


Special implementation details:

• Here add just updates the cgc array, which is a population count of the things in the current prefix xor the things in the set.
• ivs is the intervals on each x, along with the query and contribution coefficient.
• If we do a prefix sum on ansPrev and ansThere, we don't even need to use a while loop in the second Mo.

### Range inversion query, again

$O(n\sqrt n)$

Consider sweepline mo. Sweepline mo in general is a method used to reduce the number of modifications of the Mo data structure to $O(n)$, not affecting the query count of $O(n\sqrt n)$. This means modifications have to be at most $O(\sqrt n)$ and queries have to be at most $O(1)$.

For this problem we need to support point increment/decrement in $O(\sqrt n)$ and prefix sum query in $O(1)$. Flip it around to get an equivalent problem: suffix increment/decrement in $O(\sqrt n)$ and point query in $O(1)$.

Separate this auxillary array into blocks of size $O(\sqrt n)$. For each block, maintain a value $lazy$. When we do increment/decrement a suffix, for blocks that intersect with this suffix yet are not completely covered, we change the elements with brute force. Otherwise, we modify the $lazy$ tag. To query a position, we add the (brute force) value with its corresponding $lazy$ tag.

For people who think there is not much of a difference between $O(n\sqrt n)$ and $O(n\log n\sqrt n)$: When $n=100000$, the former runs in $300ms$ while the latter runs in $3000ms$. Normally time limits are $1000ms$.

## Answering queries on a static array, online

When the data structure for Mo's is small enough, you can use first do block decomposition into size $B$ and then calculate the data structure for all intervals of blocks, creating $O((N/B)^2)$ structures.

Then, assuming the transition time for this data structure is still $O(T)$, querying an arbitrary interval can be done in $O(TB)$.

Normally intialization of each structure is $O(B)$ and transition is $O(1)$, so it would be best to choose a block size of $O(\sqrt n)$ to get $O(n\sqrt n)$ precalculation and $O(\sqrt n)$ query time. Here we look at some examples:

### 522D - Closest Equals

Statement: Given a static array and queries of the form $[l, r]$, for each query calculate $\min(|i-j|:a[i]=a[j]\wedge i,j\in[l,r])$. $n\le 500000$

Doing Mo's for this problem looks a bit intractable because although insertion is easy using $pre$ and $nxt$ arrays, it seems impossible to delete values and "roll back" the $\min$ value. Hence we consider block decomposition, which would only require insertion.

Decompose this array into contiguous blocks of size $O(B)$; for each contiguous interval of blocks compute the answer, for a total precomputation complexity of $O((N/B)N)$.

Note that if we want the minimum distance the only possible $j$ values when we fix an $i$ that couuld possibly contribute to the answer are $pre[i]$ and $nxt[i]$. As such, when extending a precomputed block, we calculate if $pre[i]$ and $nxt[i]$ are in the interval we need to answer, and if so we try to update the minimum, for a query complexity of $O(B)$.

Total time complexity is $O(n^2/B+qB)$, selecting $B=O(\sqrt n)$ is best, getting a total time complexity of $O((n+q)\sqrt n)$ with tiny constant. Note that input and outout for this problem is a massive portion of the total running time.

### 617E - XOR and Favorite Number online version

Statement: Given a static array, a number $k$ and queries of the form $[l,r]$, for each query calculate the number of $(i,j)$ such that $a[i]\text{ xor }a[j]=k$.

$a[i]\text{ xor }a[j]=k \iff a[i]\text{ xor }k=a[j]$; we precalculate answers for each block and now count the contribution of new elements to the block interval. We want to know how many elements with value $a[i]\text{ xor }k$ there are in the current interval.

First of all the current interval is block chunk + at most $O(B)$ elements, so we can directly maintain a population array on these $O(B)$ elements. For the block chunk, notice that the count "# of $x$ in blocks from $l$ to $r$" is differentiable. Precalculate "# of $x$ in blocks from $1$ to $r$", and then we're done.

But not quite; this method takes $O(B\max A)$ memory for the chunk count array. Notice that there are at most $O(n)$ possible values for $a[i]$ and $a[i]\text{ xor }k$; hash these values and we get $O(nB+(n/B)^2)$ memory, $O((n+q)B+(n/B)^2)$. Square root decomposition gives $O((n+q)\sqrt n)$.

### Range distinct query revisited

Statement: Range distinct query but online

Obviously direct precomputation and utilizing $pre[i]$ and $nxt[i]$ arrays lead to an $O((n+q)\sqrt n)$ running time. But can we do better?

Notice that the bottleneck is in the precomputation $O((n/B)n)$, because for each block we need to compute the answers for all intervals that have this block as its first block. Yet we only store $O((n/B)(n/B))$ values. Obviously time complexity will be lower bounded by this if we use decomposition methods. Here we will reach this lower bound.

Consider the definition of the answer for an arbitrary interval $[l,r]$:

$A_{[l,r]}=(\sum_i[pre[i]+1\le l\wedge i\le r])-(\sum_i[pre[i]+1\le l\wedge i\le l-1])$

Let's put the points $(pre[i]+1, i)$ onto the 2D plane. Define a function $p(x,y)$ as the number of points such that its x-coordinate is smaller than $x$ and its y-coordinate is smaller than $y$, or a 2D prefix sum. Our answer is now

$A_{[l,r]}=p(l,r)-p(l,l-1)$

but $p(x,y)$ takes $O(n^2)$ time to calculate. Consider transforming the $A$ expression to incorporate decomposition into blocks of size $B$:

$A_{[l,r]}=(\sum_i[pre[i]/B+1\le l\wedge i/B\le r])-(\sum_i[pre[i]/B+1\le l\wedge i/B\le l-1])$

Now this is a prefix sum on the points $(pre[i]/B+1,i/B)$ where division is rounded down. This can obiously be computed in $O((n/B)^2)$.

Consider the optimal choice of $B$. The time complexity is

$O(\frac{n^2}{B^2}+qB)$

Selecting $B=O(\sqrt[3]{n})$ gives us a total time complexity of

$O((n+q)\sqrt[3]{n})$

This runs faster than the currently best known method of persistent segment trees because of its tiny constant for $n=10^6$. Its memory usage is also much, much smaller.

## More compilicated examples

### Optimizing brute force: 911G - Mass Change Queries

Statement: Given an array and operations of the form $l,r,x,y$, set $a[i]=y$ for all $i$ such that $i\in[l,r]$ and $a[i]=x$. Here $1\le a[i],x,y\le10^5$.

We observe that we can apply these operations to the entire array pretty quickly by storing the locations each value occur in something simlar to a vector and do small-to-large merging.

Do block decomposition on this array into blocks of size $B$. For blocks that are completely contained by an operation, do small-to-larger merging; otherwise, reconstruct this block and modify with brute force.

I don't know how to prove the true time complexity rigorously, but we can upper bound it. If there is no brute force changes, for each block it will take a total of $O(B\log B)$ time, because whenever an element is moved, the size of the vector that it is in at least doubles. Total time would thus be $O(NB\log B)$. If brute force changes on a block absolutely breaks amortized time, moving would take $O((N+Q)B\log B)$ time, for a total of $O((N+Q)B\log B+QN/B)$. Assume $Q=O(N)$ to simplify; $O(NB\log B+N^2/B)$. Selecting $B=\sqrt{\frac{N}{\log N}}$ gives

$O(N\sqrt{\frac{N}{\log N}}\log \sqrt{\frac{N}{\log N}}+\frac{N^2}{\frac{\sqrt{N}}{\sqrt{\log N}}}) \le O(N\sqrt{N\log N})$

Note here that the $\log$ is inside the square root sign.

### Value distance query

Statement: Given a static array and queries of the form $x, y$, find $\min(|i-j|:a[i]=x,a[j]=y)$.

There are at most $O(\sqrt n)$ values that occur more than $O(\sqrt n)$ times. Consider a pair $x,y$ such that at least one of $x,y$ occurs more than $O(\sqrt n)$ times. We try to precompute answers for all such $x, y$ as there are at most $O(n\sqrt n)$ pairs.

For all $x$ where $x$ occurs more than $O(\sqrt n)$ times, we sweep the array using $x$, once forwards and once backwards. We keep track of the last location where $x$ occured in the direction that we are sweeping, and then we try to update the answer for $x$ and $a[i]$ (i is the index of the sweep pointer.)

As well as that, for each value keep a sorted vector of the indices where said value occurs.

When we answer queries, if we already precomputed the answer, we directly use it; otherwise, we run two-pointers on the corresponding vectors that we stored. The total time complexity is $O((n+q)\sqrt n)$.

### [Ynoi2008] rplexq

Statement: Given a rooted tree and queries of the form $(l,r,x)$, for each query find the number of $i,j$ such that $l\le i<j\le r$ and the LCA of $i$ and $j$ is $x$.

First of all, we answer these queries offline and put the $[l,r]$ values on the corresponding node. We do dsu on tree, maintaining an implicit segtree of the indices of nodes that belong to some subtree with small-to-large merging.

If we don't even care about the intervals, the amount of pairs that have LCA at node $x$ is equal to

$\binom{sz_x}2-\sum\binom{sz_j}2$

where $j$ is the children of $x$.

When a node has a pretty small amount of children, we can answer the queries with brute force, as prefix sum queries on an implicit segtree has a small enough constant. But when a node has a lot of children AND has a lot of queries on it, this is much too slow.

First off we do a special check for the heavy child, so as to guarantee that the amount of nodes we parse are on the order of $O(n\log n)$. We then take the light children and condense the indices. Now, we can run Mo's on these indices, where what we store is $\sum\binom{colors}2$. Due to the ridiculously high initialization constant for Mo's it might be better to run brute force when queries or children count is small enough.

Note that the complexity for Mo's with length $n$ queries $m$ is $O(n\sqrt m)$. Because of

$\sum n_i\sqrt m_i\le(\sum n_i)\sqrt{\max m}$

the total time complexity is upper-bounded by $O(n\log n\sqrt m)$. ODT claims that the time complexity can be $O(n\sqrt{m\log n})$ to $O(n\sqrt m)$, but I do not know how to prove this.

If you have other interesting square root decomposition problems, please share them in the comments below and I'll include them in the next edit!

• +272

By box, history, 3 years ago,

Currently the best known complexities for querying the number of inversions in a range of a static array is $O(n+m)$ space $O(n\sqrt{m})$ time for offline; $O(n\sqrt{m})$ space same time for online. Is there any proof of these lower bounds (a proof that it's impossible to solve range inversion query in linear times polylog time), or can range inversion queries theoretically be done with lower time complexity?

source for current time complexities

• +5

By box, history, 3 years ago,

1054H - Эпическая свёртка

When I first saw this problem, I did not understand the editorial at all, so I decided to come up with my own solution. After noticing the extremely small modulo, I tried using it to my advantage.

First of all, $i^2$ and $j^3$ are purely random functions. They are really just a generic function that maps integers to numbers in the range $[0,MOD-2]$. Here we can take them modulo $MOD-1$ because of Fermat's little theorem and how the given mod, $490019$, is prime.

Call the function for array $a$ as $f$, and the function for array $b$ as $g$. Notice that if we "relabel" the arrays $a$ and $b$ into arrays $a'$ and $b'$ respectively such that $a'$ and $b'$ satisfy the following equalities:

\begin{align}a'_i=\sum_{f(j)=i}a_j,b'_i=\sum_{g(j)=i}b_j\end{align}

then the answer that we seek effectively becomes

\begin{align}\sum_{i=0}^{MOD-2}\sum_{j=0}^{MOD-2}a'_ib'_jc^{ij}\end{align}

Now, if we factor out $a'_i$, this becomes

\begin{align}\sum_{i=0}^{MOD-2}a'_i\sum_{j=0}^{MOD-2}b'_jc^{ij}\end{align}

Notice how the internal part is effectively evaluating the generating function of $b'$ at point $c^i$. Since $b'$ is a finite sequence of length $MOD-1$ evaluating it quickly at a points of the form $c^i$ is doable with Chirp-Z transform. The answer is now reduced to

\begin{align}\sum_{i=0}^{MOD-2}a'_iB'(c^i)\end{align}

which is computable in $O(\text{mod}\log\text{mod})$ time.

• +87