### Spheniscine's blog

By Spheniscine, history, 8 months ago, Spoiler

Spoiler

Spoiler

Spoiler

Spoiler

Spoiler

Spoiler

### Ex – Hakata

Spoiler dp,
By Spheniscine, history, 20 months ago, The math renderer was changed/updated recently but it looks much worse than the previous version.

First off, it takes much longer to fully render; before it does it looks like this, which is small and hard to read: Then when it does fully render, it's quite blurry: By Spheniscine, history, 20 months ago, Spoiler

Spoiler

Spoiler

Spoiler

Spoiler

### F – GCD or MIN

Spoiler

By Spheniscine, history, 23 months ago, Forgive the quirkiness of the title; I am not aware of a name for this concept, therefore I just made a few up.

This blogpost is about a particular way to construct what I call a "pseudorandom permuter". It is distinct from a regular pseudorandom number generator (PRNG) in that it is constructed such that the sequence will never repeat until all $M = 2^k$ integers in its machine-integer state space (32- or 64-bit) has been used. In other words, it's a way to produce a permutation of $[0, M)$ that's (hopefully) statistically indistinguishable from a truly random permutation.

The generator is made up of two parts, a "discrete Weyl sequence", and an "avalanching perfect hash function".

A "discrete Weyl sequence" is defined by a parameterized function $W_{s, \gamma}: [0, M) \rightarrow [0, M)$ where the $i$-th member of the sequence is $W_{s, \gamma}(i) = (s + \gamma \cdot i) \mod M$. The two parameters can be chosen using a standard random number generator, with the caveat that $\gamma$ must be odd (this can be easily ensured by a | 1 instruction). This ensures it is coprime with the machine integer size $M$.

The advantage of the discrete Weyl sequence is that as it is parameterized, it is hard to predict by an adversarial test (whether by the problemsetter or by the "hacking" mechanic). The disadvantage however, is that it is extremely regular, and this regularity may be exploited by adversarial tests.

In comes the second part. An "avalanching perfect hash function" is a function that is "perfect" in the sense that it maps $[0, M) \rightarrow [0, M)$ without collisions, i.e. $a \neq b \Leftrightarrow f(a) \neq f(b)$, and exhibits the avalanche property, which is that flipping one bit in the input should flip roughly half the bits of the output without a regularly predictable pattern. Note that our discrete Weyl sequence fits the definition of a perfect hash function, however it doesn't exhibit the avalanche property.

Constructing a good APHF is more difficult, however since it doesn't have to be parameterized, we can simply use a fixed function. One option is to use the "multiply-xorshift" construction from the "splitmix64" generator:

function splitmix64_aphf(z: uint64): uint64 {
z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
return z ^ (z >> 31);
}


where ^ denotes the binary xor operator, and >> the unsigned right-shift operator. I also use another option for 32-bit integers, taken from this offsite blogpost by Chris Wellons:

function int32_aphf(z: uint32): uint32 {
z = (z ^ (z >> 16)) * 0x7feb352d;
z = (z ^ (z >> 15)) * 0x846ca68b;
return z ^ (z >> 16);
}


We thus define our final hash function $f_{s, \gamma}(x) = A(W_{s, \gamma}(x))$, where $A$ is our chosen APHF. This combines the advantages of both parts; easy parametrization ("seedability") and good avalanche properties that make it hard to predict, as long as the adversary doesn't have access to the parameters or the generated values. (However, it is possible for an adversary with access to the generated values to reverse the function to find the parameters and thus predict the function; thus this construction is unsuitable for cryptographic purposes.)

Note that the actual splitmix64 generator is actually a construction of this type with a fixed $\gamma$ value. Hence my alternate name, "Splitmixes", as this construction allows easy generation of an unpredictable hash function with similar properties.

## Applications

This generator should not be used like a regular random-number generator, as its "no-collision until period" property will fail statistical tests that test the "birthday paradox" properties of a PRNG, i.e. generating a large number of values and checking if the number of collisions conforms to the statistically expected value.

However, it is this very property that makes it more useful than regular PRNGs for, e.g.,

• hash maps with integer keys (use $f_{s, \gamma}(x)$ for the hash-key of $x$)
• priority values in treaps (simply have a counter variable and use $f_{s, \gamma}(0), f_{s, \gamma}(1),$ etc, and you're guaranteed never to produce the same priority unless you generate more than 4-billion-plus values)

By Spheniscine, history, 2 years ago, Spoiler

Spoiler

Spoiler

Spoiler

Spoiler

### F – Random Max

Spoiler

By Spheniscine, history, 2 years ago, Spoiler

Spoiler

Spoiler

Spoiler

Spoiler

### F – Heights and Pairs

Spoiler

By Spheniscine, history, 2 years ago, Spoiler

Spoiler

Spoiler

Spoiler

Spoiler

### F – Contrast

Spoiler

By Spheniscine, history, 3 years ago, This is an extension of my article proposing the "ultimate" NTT. Here I will present a simplified and optimized variant of the Barrett reduction algorithm on that page. As it relaxes certain bounds of the standard parameters of the algorithm, I will also present a proof of the correctness of the variant, at least for this specific case.

First, the algorithm:

function mulMod(a: Long, b: Long): Long {
xh = multiplyHighUnsigned(a, b) // high word of product
xl = a * b // low word of product

g = multiplyHighUnsigned( (xh << 2) | (xl >>> 62) , BARR_R)

t = xl - g * MOD - MOD
t += (t >> 63) & MOD
return t
}


Notice that the middle routine has been greatly simplified; instead of computing the high and middle words of $xr$, we simply take the top 64 bits of $x$, in effect computing $\displaystyle \left\lfloor \left\lfloor \frac{x}{2^{62}} \right\rfloor \frac{r}{2^{64}} \right\rfloor$ in place of $\displaystyle \left\lfloor \frac {xr}{4^{63}} \right\rfloor$. Experimentally, this seems to work (and indeed, this variant is how Barrett reduction is normally implemented for multi-word cases), however a naive adjustment of the bounds of the Barrett reduction algorithm would relax the pre-normalization bound of $t$ to $[-m, 2m)$. This is bad news, as there is a possibility of overflowing into the sign bit, thus forcing the use of a modulus one bit shorter (e.g. $4611686018326724609$, $g = 3$) so that we could easily correct the result with two normalization rounds.

So the following is a proof of correctness for this relaxed algorithm, at least for the very specific use case in the described NTT:

On the Project Nayuki article on Barrett reduction (note that I will use $n$ instead of $m$ for the modulus in accordance to the notation on that page), one important lemma is the inequality

$\displaystyle\frac{x}{n} - 1 < \frac{xr}{4^k} ≤ \frac{x}{n} \implies \frac{x}{n} - 2 < \left\lfloor \frac{x}{n} - 1 \right\rfloor ≤ \left\lfloor \frac{xr}{4^k} \right\rfloor ≤ \frac{x}{n}$

We shall show that this inequality holds even if $\dfrac{xr}{4^k}$ is replaced by $\dfrac{(x-\delta) r}{4^k}$, with $0 ≤ \delta < 2^{62}$, the bits that have been omitted.

First we need to obtain a tighter bound on $r$. Given that $r$, $k$, and $n$ are fixed in our algorithm, we can verify that $\dfrac {4^k} n - r < 2^{-9}$. Let this value be $\varepsilon$. We thus obtain $\displaystyle \frac{4^k}{n} - \varepsilon < r < \frac{4^k}{n}$

Multiply by $x-\delta \geq 0$: $\displaystyle (x-\delta)\left(\frac{4^k}{n} - \varepsilon\right) ≤ (x-\delta)r ≤ (x-\delta)\frac{4^k}{n}$

Given that $\delta$ is nonnegative we can relax the right bound to $x\dfrac{4^k}{n}$.

Divide by $4^k$: $\displaystyle (x-\delta)\left(\frac{1}{n} - \frac \varepsilon {4^k}\right) ≤ \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$

Recompose the leftmost expression: $\displaystyle \frac x n - \left(\frac {\varepsilon x} {4^k} + \frac \delta n\right) + \frac{\delta\varepsilon} {4^k} ≤ \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$

$\displaystyle\frac{\delta\varepsilon} {4^k} \geq 0$, so relax the bound: $\displaystyle \frac x n - \left(\frac {\varepsilon x} {4^k} + \frac \delta n\right) ≤ \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$

$x < n^2 < 4^k \implies \dfrac{x}{4^k} < 1$, so further relax the bound: $\displaystyle \frac x n - \left({\varepsilon} + \frac \delta n\right) < \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$

$\delta < 2^{62}$ while $n$ is very slightly below $2^{63}$, so it can be verified that $\dfrac \delta n$ must be $< \dfrac34$.

$\dfrac34 + \varepsilon < 1$, therefore $\displaystyle \frac x n - 1 < \frac {(x-\delta)r}{4^k} ≤ \frac{x}{n}$

We can thus follow the rest of the proof in the Nayuki article to prove that our $t$ is in $[-n, n)$.

#### Samples (Kotlin, so YMMV on benchmarks for other languages)

74853038 for 993E - Nikita and Order Statistics, 328 ms faster

74852214 for 986D - Perfect Encoding, 452 ms faster

There are other applications of this proof besides NTT. For example, I found a safe prime $9223372036854771239$, which is a good property for a rolling-hash modulus to have, as any number in $[2, m-2]$ will have multiplicative order at least $\dfrac{m-1}{2}$. It seems that $\varepsilon$ and $\dfrac\delta n - \dfrac 1 2$ will both be rather small for moduli so close below a power of 2. fft, ntt,
By Spheniscine, history, 3 years ago, Prerequisites: you need to be familiar with both modular arithmetic and Fast Fourier Transform / number theoretic transform. The latter can be a rather advanced topic, but I personally found this article helpful. Note that there are some relatively easy optimizations/improvements that can be done, like precalculating the modular inverse of $n$, the "bit-reversed-indices" (can be done in $O(n)$ with DP), as well as the powers of $\omega_n$. Also useful is modifying it so that the input array length can be any power of two $\leq n$; some problems require multiplying polynomials of many different lengths, and you'd rather the runtime be $O(n \log n)$ over the sums of lengths, rather than (number of polynomials * maximum capacity).

Also helpful is knowing how to use NTT to solve problems modulo $998244353$, like 1096G - Lucky Tickets and 1251F - Red-White Fence. Note that for some problems it's easier to think of FFT not as multiplying polynomials, but of finding multiset $C$ as the pairwise sums of multisets $A$ and $B$, in the form of arrays $A[i] =$ number of instances of $i$ in multiset $A$. This is equivalent to multiplying polynomials of the form $\sum _{i=0} ^n A[i]x^i$.

Note that $\omega_n$ can be easily found via the formula $g ^ {(m-1) / n} \ \text{ mod } m$, provided that:

1. $m$ is prime
2. $g$ is any primitive root modulo $m$. It is easiest to find this before hand and then hardcode it in the submission. You can either implement the algorithm yourself or use Wolfram Alpha to find it via the query PrimitiveRoot[m]. (Spoiler alert, $g = 3$ works well for $998244353$)
3. $n$ divides $m-1$ evenly. As $n$ is typically rounded up to the nearest power of 2 for basic FFT implementations, this is easiest when $m$ is of the form ${a \cdot 2^{k} + 1}$ where $2^k \geq n$. This is why $998244353$ is a commonly-appearing modulus; it's $119 \cdot 2^{23} + 1$. Note that this modulus also appears in many problems that don't require FFT/NTT; this is a deliberate "crying-wolf" strategy employed by puzzle writers, so that you can't recognize immediately that a problem requires FFT/NTT via the given modulus.

Now onto the main topic, the "ultimate" NTT.

Rationale: There are a few problems like 993E - Nikita and Order Statistics that require FFT, however the results aren't output with any modulus, and indeed may exceed the range of a 32-bit integer type. There are several usual solutions for these types of problems:

1. Do NTT with two moduli and restore the result via Chinese Remainder Theorem. This has several prominent disadvantages:
1. Slow, as the NTT routine has to be done twice.
2. Complicated setup, as several suitable moduli must be found, and their primitive roots calculated
3. Restoring the result with CRT requires either brute force or multiplications modulo $pq$, which may overflow even 64-bit integer types.
2. Do FFT with complex numbers and floating point types. Disadvantages are:
1. Could be slow due to heavy floating-point arithmetic. Additionally, JVM-based languages (Java, Kotlin, Scala) suffer complications here, as representing complex numbers with object-based tuples adds a significant overhead.
2. Limited precision due to rounding error. Typically the problems are constructed such that it won't be a problem if care is taken in the implementation, but won't it be nice to just not to have to worry about it?

To solve these problems, I propose the "ultimate" NTT solution — just use one huge modulus. The one I use is $m = 9223372036737335297 = 549755813881 \cdot 2^{24} + 1, g = 3$. This is just over a hundred million less than $2^{63} - 1$, the maximum value of a signed 64-bit integer.

However, this obviously raises the issue of how to safely do modular arithmetic with such huge integers. Addition is complicated by possible overflow into the sign bit, thus the usual if(x >= m) x -= m won't work. Instead, first normalize $x$ into the range $[-m, m)$; this is easily done with subtracting $m$ before any addition operation. Then do x += (x >> 63) & m. This has the effect of adding $m$ to $x$ if and only if $x$ is negative.

The elephant in the room however is multiplication. The usual method requires computing a 126-bit product, then doing a modulo operation over a 63-bit integer; this could be slow and error-prone to implement. C++ users could rejoice, as Codeforces recently implemented support for 128-bit integers via this update. However, before you celebrate too early, there are still issues: this may not be available on other platforms, and I can't imagine that straight 128-bit modulo 64-bit is exactly the fastest operation in the world, so the following might still be helpful even to C++ users.

A rather simple-to-code general-case method is to "double-and-add" similar to modular exponentiation, however it is too slow for the purposes of this article; FFT implementation speed is heavily bound by the speed of multiplications.

My favored technique for this problem is called Barrett reduction. The explanation is (adapted from this website)

• Choose integer $k$ such that $2^k > m$. $k = 63$ works for our modulus.
• Precompute $\displaystyle r = \left\lfloor \frac {4^k} m \right\rfloor$. For our modulus this is $9223372036972216319$, which just overflows a signed 64-bit integer. You can either store it as unsigned, or the as the signed 64-bit representation $-9223372036737335297$ (yes, this just happens to be $-m$, which is quite a neat coincidence)
• Multiply the two integers. Note that we need all 126 bits of this product. The low bits can be easily obtained via straight multiplication, however the high bits need some bit-shifting arithmetic tricks to obtain. Adapt the following Java code, taken from here for your preferred language:
multiplyHighUnsigned code
• Let the product be $x$. Then calculate $\displaystyle t = x - \left\lfloor \frac{xr}{4^k} \right\rfloor m - m$. This requires some adaptation of grade-school arithmetic; pseudocode in spoiler below. $t$ is guaranteed to be in the range $[-m, m)$, so the t += (t >> 63) & m trick should work to normalize it. In the following pseudocode, BARR_R $= r$ and MOD $= m$. Also, ^ represents bitwise xor, not exponentiation.
mulMod pseudocode

Update: I have further optimized the Barrett reduction step, however it requires a new proof of correctness, so I'm keeping the old variant on this page. For info on the optimized version, read this article.

And that's it for this blogpost. This should be sufficient to solve most integer-based FFT problems in competitive programming. Note that if you need negative numbers in your polynomials, you can input them modulo $m$, then assume any integer in the output that exceeds $m/2$ is negative; this works as long as no number in the output has an absolute value greater than $m/2$, yet another advantage of using such a huge modulus.

Note that for arbitrary polynomials of length $n$ with input values not exceeding $a$, the maximum possible value in the product polynomial is around $a^2n$.

#### Samples

74641269 for 986D - Perfect Encoding — this is particularly interesting, because this huge modulus allows 6 digits per word in the custom big-integer implementation instead of the recommended 3, cutting down the FFT size (as well as the size of other arithmetic operations) by half. With this tight a time limit, I'm not sure this problem is solvable in Kotlin otherwise.

There are a small minority of problems that may ask to perform FFT with an inconvenient modulus like $10^9 + 7$, or an arbitrarily given one in the input (do any exist on Codeforces?). $a^2n$ in this case could overflow even this large modulus, but it can be handled with the "multiplication with arbitrary modulus" method listed in the CP-algorithms article. (I heard this is called MTT; also known as "FFT-mod") Even then, the large modulus this article covers might be helpful, as it reduces the number of decompositions you must make.

#### More addendum (December 2021)

It seems this method is significantly faster when run on a 64-bit system:

130750094 (1808 ms, Rust ran on 32-bit before an update circa November, IIRC)

140255354 (576 ms, after 64-bit update)

It even beats out my implementations using floats (140229918, 717 ms) and CRT/Garner's algorithm with two 32-bit moduli (140230689, 842 ms)

Unfortunately Java and Kotlin, at this time of writing, still seems to run on a VM that's effectively 32-bit.

There is also another interesting prime modulus, as noted in this offsite blogpost: $18446744069414584321$ (hex: 0xffff_ffff_0000_0001, $g = 7$). Due to its bitpattern, you can use some adds, subtracts, and shifts to do the reduction under this modulus, instead of the multiplies needed by Barrett reduction.

However my implementation of it ended up slightly slower (140308499, 639 ms). It seems multiplies are just that good on 64-bit systems. It might still be worth considering for its slightly higher range, huge capacity (can theoretically support FFTs up to $2^{32}$ in length, though you'd likely MLE first). It might also be friendlier to 32-bit systems (needs testing). This modulus, as noted in the blog post, also has other interesting properties; chiefly that $2$ is a $192^{\text{nd}}$ root of unity; this allows some roots of unity to be expressed in powers of two, which allows using shifts instead of multiplication in some cases. However I've not figured out how to efficiently use this fact (it might be more helpful for mixed-radix variants of FFT, which I've not learned how to implement).

Either modulus can also be used to emulate convolutions in other arbitrary moduli, such as $10^9 + 7$ (https://judge.yosupo.jp/submission/70563) and $2^{64}$ (https://judge.yosupo.jp/submission/70564), by splitting each coefficient into $k$ "place values" for a particular base (I use $2^{16}$ in these examples), then assigning $2k - 1$ slots for each coefficient (the extra slots allow the convolution to "spill over"), essentially emulating multivariate convolution with a single FFT. The speed overhead of this method, however, is quite significant; my implementation using CRT/Garner's (https://judge.yosupo.jp/submission/70678) is much faster in this case. fft, ntt,
By Spheniscine, history, 3 years ago, I have noticed some of my older blogs and comments have some of their $\TeX$ mathematical expressions incorrectly rendered in their own line instead of in the line of text, which can be really distracting to read.

Examples: https://codeforces.com/blog/entry/72593 ($\mathbb Z /p \mathbb Z$, and the large numbers in the spoilers)

https://codeforces.com/blog/entry/73211#comment-575285 This comment had it especially bad

I'm not sure what the cause of this is, and there is no discernible pattern as to which expressions and comments are affected.

Upd: It also affects others' comments too, like: https://codeforces.com/blog/entry/73211?#comment-575152 bug,
By Spheniscine, history, 3 years ago, Advent of Code is a website that releases programming puzzles every December from the 1st to the 25th. The puzzles have two parts, and the second part isn't revealed until you solve the first part.

This year, there have been many questions regarding part 2 of day 22, as it involves modular arithmetic in a way that hasn't been seen in previous puzzles there. I have been asked for a tutorial for it on Reddit, but I'm posting it here due to better support for mathematical notation.

Part 2 summary

First off, if you are unfamiliar with modular arithmetic, I encourage you to read my other blog post, Modular Arithmetic for Beginners. The most important thing to understand from there is how the four basic operations (addition, subtraction, multiplication, division) can be redefined to work in modular ($\mathbb Z / p \mathbb Z$ field) arithmetic. I also recommend you try some of the puzzles linked (This one is appropriately Christmas-themed too). The terminology and notations I'll use will also be similar.

It is possible to solve this without explicitly invoking modular arithmetic (see this post for an example) but they basically amount to rediscovering certain properties of modular arithmetic anyway, as such that's the "language" I'll use for this tutorial.

The first thing to notice is that each of the given instructions amount to a transformation on the position of each card. Let $m$ represent the number of cards in the deck.

• "deal into new stack" moves cards from position $x$ to position $m - x - 1$. We can write this as $f(x) = m - x - 1$.
• "cut $n$" moves cards from position $x$ to position $x - n\ \text{ mod } m$ (note how indices "wrap around" to the other side), Thus $f(x) = x - n\ \text{ mod } m$. Note this also works for the version with negative $n$.
• "deal with increment $n$" moves cards from position $x$ to position $n \cdot x\ \text{ mod } m$. Thus, $f(x) = n \cdot x\ \text{ mod } m$

The next thing to notice is that each transformation can be rewritten in the form of $f(x) = ax + b\ \text{ mod } m$. This is called a linear congruential function, and forms the basis for linear congruential generators, a simple type of pseudorandom number generator.

• "deal into new stack": $f(x) = -x - 1\ \text{ mod } m$, so $a = -1, b = -1$
• "cut $n$": $f(x) = x - n\ \text{ mod } m$, so $a = 1, b = -n$
• "deal with increment $n$": $f(x) = n \cdot x\ \text{ mod } m$, so $a = n, b = 0$

The next step is to see what happens when you compose two arbitrary linear congruential functions $f(x) = ax + b\ \text{ mod } m$ and $g(x) = cx + d\ \text{ mod } m$. So what do you get if you evaluate $g(f(x))$? By substitution:

$g(f(x)) = c(ax + b) + d\ \text{ mod } m$

As I established in Modular Arithmetic for Beginners, algebra works with modular residues similarly to real numbers, so let's expand:

$g(f(x)) = acx + bc + d\ \text{ mod } m$

An alternate notation for $g(f(x))$ is $f\ ;g(x)$ ("first apply $f$, then apply $g$"). Further more, we can abstract all linear congruential functions (LCF) as its own data type, consisting of a tuple $(a, b)$. Thus we can implement a compose operation between two LCFs:

$(a, b)\ ; (c, d) = (ac \text{ mod } m, bc + d\ \text{ mod } m)$

We store all LCF coefficients modulo $m$, to avoid them growing too big and slowing down our program or taking all available memory. Also note the possible pitfall: when multiplying two residue values, it's possible to overflow even the 64-bit data type, as our modulus for part 2 exceeds $2^{32}$. If you use a language with an arbitrary-size number type (e.g. Java's BigInteger), that'd do. If you don't, you might consider either importing a third-party library, or you can implement "multiplication by doubling" via a process similar to "exponentiation by squaring" described on the Modular Arithmetic for Beginners page. It's not the most efficient way to get around this problem, but for our purposes it'd work fine.

With these methods, you can translate all the steps in your puzzle input into LCFs, then compose them successively ($f_1\ ; f_2\ ; f_3\ ;...$) into a single LCF. Let's call it $F$. Recall that we've stored it as a tuple $(a, b)$ representing the equation $F(x) = ax + b\ \text{ mod } m$.

$F$ now represents a single shuffle as directed by your input. You can test your implementation by re-solving part 1 with it. (note that you have to re-compose it with a different modulus for each part.)

However, we require over a hundred trillion shuffles (we'll call this number $k$), so we can't just compose $F$ into itself naively. There are two approaches to solving this problem:

Method 1:

Compose $F$ into itself $k$ times algebraically. Do a few rounds by hand on paper and you'll notice a pattern emerge:

$F^k(x) = a^k x + (a^{k-1} + a^{k-2} + ... + a^1 + 1) b\ \text{ mod } m$. The more-formal notation would be $\displaystyle F^k(x) = a^k x + \sum_{i=0}^{k-1} ba^i\ \text{ mod } m$.

The summation forms a geometric series. According to that Wikipedia page, we can thus transform the expression to: $\displaystyle F^k(x) = a^k x + \frac{ b(1 - a^k) } { 1 - a } \ \text{ mod } m$

Exponentiation by squaring is required to perform the exponentiations, and modular multiplicative inverse to perform the division.

Method 2:

Notice that composing LCFs are associative, i.e. $(f\ ; g)\ ; h = f\ ; (g\ ; h)$.

Proof: $h(g(f(x))) = h(f\ ; g(x)) = g\ ; h(f(x)) = f\ ; g\ ; h(x)$

Note that this doesn't uniquely apply to LCFs; composition of any "pure functions" (i.e. always returns the same output for a given input) is associative. This is computationally useful for any function that can be composed in a similar manner to what we do here with LCFs before invoking them with actual parameters.

The "exponentiation by squaring" method works on all associative operations, so it can be adapted to the compose operation as well. We can thus obtain $F^k$, $F$ composed into itself $k$ times.

For example, we could adapt the pseudocode for exponentiation by squaring (from Modular Arithmetic for Beginners) like this:

function pow_compose(f: lcf, k: int64) → lcf:
g := lcf(1, 0)
while k > 0:
if k is odd:
g := compose(g, f)
k := ⌊k/2⌋
f := compose(f, f)
return g


Notice how multiplication is replaced with the compose function we used to compose the steps in the input, and the initial multiplicative identity of 1 is replaced with the "identity LCF" $f(x) = x \text{ mod } m$, i.e. $(1, 0)$

Continued:

Now for the last step. Notice that we're not asked for where card $x$ ends up, but rather what card is in position $x$. To do that, we need to invert the function $F^k$. Let $F^k(x) = Ax + B\ \text{ mod } m$.

We invert it by substitution: $x = A \cdot F^{-k}(x) + B\ \text{ mod } m$, therefore $F^{-k}(x) = \dfrac {x - B} A\ \text{ mod } m$. Again this requires an application of modular multiplicative inverse.

Plug in $2020$ for $x$, and we should finally get our solution. Whew.

Addendum: matrix representation of LCFs By Spheniscine, history, 3 years ago, ## Introduction

If you're new to the world of competitive programming, you may have noticed that some tasks, typically combinatorial and probability tasks, have this funny habit of asking you to calculate a huge number, then tell you that "because this number can be huge, please output it modulo $10^9 + 7$". Like, it's not enough that they ask you to calculate a number they know will overflow basic integer data types, but now you need to apply the modulo operation after that? Even worse are those that say you need to calculate a fraction $\frac pq$ and ask you to output $r$ where $r \cdot q \equiv p \pmod m$... not only do you have to calculate a fraction with huge numbers, how in the world are you going to find $r$?

Actually, the modulo is there to make the calculation easier, not harder. This may sound counterintuitive, but once you know how modular arithmetic works, you'll see why too. Soon you'll be solving these problems like second nature.

## Terminology and notation

For convenience, I will define the notation $n \text{ mod } m$ (for integers $n$ and $m$) to mean $n - \left\lfloor \dfrac nm \right\rfloor \cdot m$, where $\lfloor x \rfloor$ is the largest integer that doesn't exceed $x$. (This should always produce an integer between $0$ and $m-1$ inclusive.) This may or may not correspond to the expression n % m in your programming language (% is often called the "modulo operator" but in some instances, it's more correct to call it the "remainder operator"). If -8 % 7 == 6, you're fine, but if it is -1, you'll need to adjust it by adding $m$ to any negative results. If you use Java/Kotlin, the standard library function Math.floorMod(n, m) does what we need.

Also for convenience, I will also define the $\text{ mod }$ operator to have lower precedence than addition or subtraction, thus $ax + b\ \text{ mod } m \Rightarrow (ax + b) \text{ mod } m$. This probably does not correspond with the precedence of the % operator.

The value $m$ after the modulo operator is known as the modulus. The result of the expression $n \text{ mod } m$ is known as $n$'s residue modulo $m$.

You may also sometimes see the notation $expr_1 \equiv expr_2 \pmod m$. This is read as "$expr_1$ is congruent to $expr_2$ modulo $m$", and is shorthand for $expr_1 \text{ mod } m = expr_2 \text{ mod } m$.

## "Basic" arithmetic

First off, some important identities about the modulo operator:

$(a \text{ mod } m) + (b \text{ mod } m)\ \text{ mod } m = a + b\ \text{ mod } m$

$(a \text{ mod } m) - (b \text{ mod } m)\ \text{ mod } m = a - b\ \text{ mod } m$

$(a \text{ mod } m) \cdot (b \text{ mod } m)\ \text{ mod } m = a \cdot b\ \text{ mod } m$

These identities have the very important consequence in that you generally don't need to ever store the "true" values of the large numbers you're working with, only their residues $\text{ mod } m$. You can then add, subtract, and multiply with them as much as you need for your problem, taking the modulo as often as needed to avoid integer overflow. You may even decide to wrap them into their own object class with overloaded operators if your language supports them, though you may have to be careful of any object allocation overhead. If you use Kotlin like I do, consider using the inline class feature.

But what about division and fractions? That's slightly more complicated, and requires a concept called the "modular multiplicative inverse". The modular multiplicative inverse of a number $a$ is the number $a^{-1}$ such that $a \cdot a^{-1}\ \text{ mod } m = 1$. You may notice that this is similar to the concept of a reciprocal, but here we don't want a fraction; we want an integer, specifically an integer between $0$ and $m-1$ inclusive.

But how do you actually find such a number? Bruteforcing all numbers to a prime number close to a billion will usually cause you to exceed the time limit. There are two faster ways to calculate the inverse: the extended GCD algorithm, and Fermat's little theorem. Though the extended GCD algorithm is more versatile and sometimes slightly faster, the Fermat's little theorem method is more popular, simply because it's almost "free" once you implement exponentiation, which is also often a useful operation in itself, so that's what we'll cover here.

Fermat's little theorem says that as long as the modulus $m$ is a prime number ($10^9 + 7$ is prime, and so is $998\ 244\ 353$, another common modulus in these problems), then $a^m \text{ mod } m = a \text{ mod } m$. Working backwards, $a^{m-1} \text{ mod } m = 1 = a \cdot a^{m-2}\ \text{ mod } m$, therefore the number we need is $a^{m-2} \text{ mod } m$.

Note that this only works for $a \text{ mod } m \neq 0$, because there is no number $x$ such that $0 \cdot x\ \text{ mod } m = 1$. In other words, you still can't divide by $0$, sorry.

Multiplying $m-2$ times would still take too long; therefore a trick known as exponentiation by squaring is needed. It's based on the observation that for positive integer $n$, that if $n$ is odd, $x^n=x( x^{2})^{\frac{n - 1}{2}}$, while if $n$ is even, $x^n=(x^{2})^{\frac{n}{2}}$. It can be implemented recursively by the following pseudocode:

function pow_mod(x, n, m):
if n = 0 then return 1
t := pow_mod(x, ⌊n/2⌋, m)
if n is even:
return t · t  mod m
else:
return t · t · x  mod m


Or iteratively as follows:

function pow_mod(x, n, m):
y := 1
while n > 0:
if n is odd:
y := y · x  mod m
n := ⌊n/2⌋
x := x · x  mod m
return y


Now that you know how to compute the modular multiplicative inverse (to refresh, $a^{-1} = a^{m-2} \text{ mod } m$ when $m$ is prime), you can now define the division operator:

$a / b\ \text{ mod } m = a \cdot b^{-1}\ \text{ mod } m$

This also extends the $\text{ mod }$ operator to rational numbers (i.e. fractions), as long as the denominator is coprime to $m$. (Thus the reason for choosing a fairly large prime; that way puzzle writers can avoid denominators with $m$ as a factor). The four basic operations, as well as exponentiation, will still work on them as usual. Again, you generally never need to store the fractions as their "true" values, only their residues modulo $m$.

Congratulations! You have now mastered $\mathbb Z / p \mathbb Z$ field arithmetic! A "field" is just a fancy term from abstract algebra theory for a set with the four basic operators (addition, subtraction, multiplication, division) defined in a way that works just like you've learned in high-school for the rational and real numbers (however division by zero is still undefined), and $\mathbb Z / p \mathbb Z$ is just a fancy term meaning the set of integers from $0$ to $p - 1$ treated as residues modulo $p$.

This also means that algebra works much like the way you learned in high school. How to solve $3 = 4x + 5\ \text{ mod } 10^9 + 7$? Simply pretend that $x$ is a real number and get $x = -1/2\ \text{ mod } 10^9 + 7 = 500000003$. (Technically, all $x$ whose residue is $500000003$, including rationals, will satisfy the equation.)

You can also now take advantage of combinatoric identities, like $\displaystyle \binom{n}{k} = \frac{n!}{k! (n-k)!}$. The factorials can be too big to store in their true form, but you can store their modular residues instead, then use modular multiplicative inverse to do the "division".

There are only a few things you need to be careful of, like:

• divisions through modular multiplicative inverse would be slower than the other operations ($O (\log m)$ instead of $O(1)$), so you may want to cache/memoize the inverses you use frequently in your program.

• comparisons (once you represent a number by its modulo residue, comparisons are generally meaningless, as your $1$ might "really" be $m + 1$, $10^{100} m + 1$, $-5m + 1$, or even $\dfrac {1} {m+1}$)

• exponentiation (when evaluating $x^n \text{ mod } m$, you can't store $n$ as $n \text{ mod } m$. If $n$ turns out to be really huge, you need to calculate it modulo $\varphi(m)$ instead, where $\varphi$ stands for Euler's totient function. If $m$ is prime, $\varphi(m) = m - 1$. Note that this new modulus will then usually not be prime, thus "division" in it will not be reliable (you can still use the extended GCD algorithm, but only for numbers coprime to the new modulus), but you can still use the other three operators. In abstract algebra theory, $\mathbb Z / n \mathbb Z$ is a "ring" rather than a "field" when $n$ isn't prime due to this loss). Do be careful about the special case $0^0$, which should typically be defined as 1, while $0^{\varphi(m)}$ would still be $0$.

## Puzzles

Here are some simpler puzzles that require a modulo answer:

1281C - Cut and Paste

1279D - Santa's Bot

1178C - Tiles

1248C - Ivan the Fool and the Probability Theory

935D - Fafa and Ancient Alphabet

300C - Beautiful Numbers By Spheniscine, history, 3 years ago, I pretty much exclusively use Kotlin for competitive programming, mostly because it's the language I'm currently most comfortable with. Here are some scattered notes and tidbits about my experience which I think might be useful to others; if you have any tips/suggestions, feel free to let me know.

### Primer

• Kotlin has an official primer for competitive programming. However, the IO code suggested there is only so-so; it's definitely better than Scanner, but you definitely can save a lot of runtime in heavy input problems by using the classic Java combination of BufferedReader + StringTokenizer
My current IO template

### Useful features

• A lot less boilerplate than Java. Members are public by default. Type inference means a lot less "Pokémon speak". Variables and functions can be declared straight in the top-level of the file. (basically the equivalent of static functions). Fields have implicit getters and setters that can easily be overridden when necessary.

• PHP-like string templates, e.g. "$id$cost"

• Extension functions – syntactic sugar for static functions; gives more natural "afterthought" syntax, as well as allowing direct access to public members of the receiver

• data classes – basically custom tuples. Allows convenient destructuring declarations too.

• Has access to the data structures in the Java standard library (TreeMap, HashMap, PriorityQueue etc.), and also can use BigInteger and BigDecimal if needed

• Functional idioms for collection manipulation – map, fold, filter, etc.

• Sequences – lazy sequence generators, potentially infinite, has standard collection manipulation functions too. Using the sequence { ... } block function allows building a sequence using a scoped yield(value) function, reminiscent of Python

• inline classes – allows the creation of a new type that wraps over a base type, but that is represented by an underlying type at runtime. Especially useful for "modulo $10^9 + 7$" problems, as I keep code for a ModInt class that overloads the arithmetic operators appropriately, but is represented as a plain int in JVM runtime. Keep in mind that they are experimental as of Kotlin 1.3, but that's fine for CP in my opinion

• unsigned integer types in the standard library that use the inline class feature. Not used very often, but handy if needed

• inline functions – tells the compiler to inline the function to call sites. Useful for higher-order functions (JVM won't need to create a new object for the lambda) as well as small functions that are called very often; basically, anything you might use a macro for in C++, you probably want to use an inline fun or inline val for

• tailrec fun – tail recursion optimization

• run block function – great way to write code that needs to shortcut (e.g. return@run "NO") without having to write a new function and pass every relevant argument

• functions in functions – functions can be defined within e.g. the main function, so again, no having to pass lots of arguments or global variables. Keep in mind that these are represented as objects during runtime. It's too bad they can't be inline as of yet

### Potential pitfalls

• Generic wrappers for JVM primitive types can cause TLE for some problems. Use primitive arrays (IntArray etc.) whenever possible to avoid this, but see next point

• Inherits the hack-prone quicksort from Java for primitive arrays. Easiest solution is to use generic arrays or lists instead, but due to the performance benefit of primitive arrays, I've took the trouble to write code that shuffles them randomly before sorting.

• For Kotlin versions < 1.3.30, there is a bug that will throw an exception when using .asJavaRandom() on instances of kotlin.random.Random, including Kotlin's default instance. Either use Java's own Random class, or steal this wrapper:

class JavaRandom(val kt: Random) : java.util.Random(0) {
override fun next(bits: Int): Int = kt.nextBits(bits)
override fun setSeed(seed: Long) {}
} 