2018 CMUT BeihangU Contest, Editorial

This editorial corresponds to 2018 Chinese Multi-University Training, BeihangU Contest (stage 5), which was held on Aug 6th, 2018. Moreover, this problem set was also used as Jingzhe Tang Contest 1 in Petrozavodsk Winter Camp on Jan 30th, 2019.

This post is now finished, in which I try to elaborate on notes, solutions and maybe some data generating. Alternatively, you can refer to an old published material, though I think the old English version did not explain something clearly.

102114A - Always Online

This problem requires to calculate $$$s$$$-$$$t$$$ min cut between any two vertices on a weighted cactus graph having $$$n$$$ vertices, denoted by $$$\mathrm{flow}(s, t)$$$. You need to report $$$\sum_{s < t}{(s \oplus t \oplus \mathrm{flow}(s, t))}$$$.

$$$n \leq 10^5$$$, $$$\sum{n} \leq 10^6$$$, weights are $$$\leq 10^9$$$.

Try to find some features of this graph.

solution

note

102114B - Beautiful Now

Given an integer $$$n$$$, you are asked to swap digits of $$$n$$$ in exactly $$$k$$$ turns. In each turn, you can swap two digits, which can be the same digit, but after this turn, $$$n$$$ must not have any leading zero. Calculate the maximal and the minimal values you can get in the end.

$$$100$$$ tests, $$$n, k \leq 10^9$$$.

This problem is yet another problem related to swapping. Can you solve it simply and elegantly?

solution 1

solution 2

note

Wait, wait, wait... Does it seem like a notorious coincidence with this problem? What? This problem has an incredible data range... ($$$n < 10^{1000}$$$) Does it really solvable??? Oh, I can't believe that!!! >_<

102114C - Call It What You Want

This problem asks you to factorize polynomial $$$(x^n - 1)$$$ over the field of integer polynomials. Besides, the statement contains some mathematical formulas you may need to apply.

$$$n \leq 10^5$$$, $$$\sum{n} \leq 5 \times 10^6$$$.

Maybe you just need some observation to solve.

solution

note

102114D - Daylight

Given an unweighted tree having $$$n$$$ vertices, you need to determine the union of two sets and report its size for $$$m$$$ queries, where each set is a set of vertices whose distances to a given vertice are no more than a fixed value, and the values for two sets are the same. However, queries are encrypted so you need to handle them one by one (online).

$$$10$$$ large tests, $$$n, m \leq 10^5$$$.

Emmm... A typical data structure problem, right?

~~Certainly I've found it on CodeChef. What? Why TLE???~~

solution

102114E - Everything Has Changed

A geometry problem to ensure you have checked in this contest. Read the statement for more details.

solution

note

102114F - Fireflies

Consider all the lattice points in a hypercube $$$\lbrace (x_1, x_2, \ldots, x_n) | 1 \leq x_i \leq p_i \rbrace$$$. Find a maximal subset such that there are no two points $$$(x_1, x_2, \ldots, x_n)$$$, $$$(y_1, y_2, \ldots, y_n)$$$ meeting the condition $$$x_i \leq y_i$$$ for all $$$i$$$. Report its size modulo $$$(10^9 + 7)$$$.

$$$n \leq 32$$$, $$$p_i \leq 10^9$$$.

How fast can you achieve to solve it?

solution

Here is an old problem with smaller data range. If anyone can tell me who the author is, I will add the credit soon.

102114G - Glad You Came

There are $$$m$$$ operations over an array $$$a_1, a_2, \ldots, a_n$$$, each operation of which is to update $$$a_i$$$ $$$(l \leq i \leq r)$$$ by $$$\max(a_i, v)$$$. You need to determine the array after all the operations.

$$$n \leq 10^5$$$, $$$\sum{n} \leq 10^6$$$, $$$m \leq 5 \times 10^6$$$, $$$\sum{m} \leq 5 \times 10^7$$$. Besides, $$$l, r, v$$$ are randomly selected.

solution 1

solution 2

solution 3

102114H - Hills And Valleys

Given an array of length $$$n$$$, consisting of $$$0, 1, \ldots, 9$$$, your task is to reverse exactly one interval and then make the longest non-decreasing subsequence of the whole array as long as possible. The reversed interval is also required to report.

$$$n \leq 10^5$$$, $$$\sum{n} \leq 2 \times 10^5$$$.

solution

102114I - Innocence

Count the number of solutions for the equation $$$x_1 \oplus x_2 \oplus \cdots \oplus x_N = K$$$, where $$$x_i$$$ can be any integer in $$$[L, R]$$$. There are $$$Q$$$ queries with the same $$$N, L, R$$$ but different $$$K$$$.

$$$100$$$ large tests, $$$N \leq 10^9$$$, $$$0 \leq L, R, K < 2^{30}$$$, $$$Q \leq 100$$$.

solution

Let's solve an easier version first. In this version, we are given $$$N, {R_i}, K$$$ and we have to choose $$$N$$$ integers $$$x_1, x_2, \ldots, x_N$$$ such that $$$x_i \in [0, R_i]$$$ for all $$$i$$$, and the bitwise XOR of them equals to $$$K$$$. Similar problems can be found at Crystals — POI 2006 and Stone Game — Hackerrank.

In case that $$$x_i \in [0, 2^m - 1]$$$ for all $$$i$$$, we know the problem is related to XOR equation on bits. Furthermore, when $$$(N - 1)$$$ ones of these variables are fixed, the rest one is determined. So if any solution exists, the number of solution is always $$$2^{m (N - 1)}$$$, no matter what $$$K$$$ is.

If we apply digit DP on bits, we can classify all the situations into different groups by the highest digit satisfying that there exists at least one variable $$$x_j$$$ which is strictly less than $$$R_i$$$ on this digit. Then, we know the lower bits of $$$x_j$$$ can be arbitrary, so we can leave its lower bits first and determine that by equation after other variables are enumerated and fixed. It is not difficult to design a digit DP to count. Time complexity for this part is $$$\mathcal{O}(N \log R)$$$.

Now let's turn back to the harder version, where $$$N$$$ becomes bigger and the lower bound of $$$x_i$$$ is limited. First, we can apply the inclusion-exclusion principle to eliminate the extra lower bound, in other words, we reduce it into many states in which either $$$x_i \in [0, R]$$$ or $$$x_i \in [0, L - 1]$$$ is satisfied. Then, let's take a look at the digit DP. We can record the sign caused by the inclusion-exclusion principle, and thus boost the same transition of DP using fast power algorithm on matrix multiplications. Time complexity is $$$\mathcal{O}((8^3 \log N + Q) \log R)$$$.

Oops... It seems like I've missed an important part — why matrix multiplications can reduce $$$2^N$$$ states into $$$2$$$ states. On each bit, we keep the higher bits as the same as the upper bounds, use DP to ensure at least one variable $$$x_j$$$ exists and determine XOR sum on this bit, and then determine the lower bits by XOR equation. The only difference among states is due to the upper bounds of the higher bits when we apply the inclusion-exclusion principle. Fortunately, their bitwise XOR sums differ when the parities of the numbers of upper bounds that are $$$R$$$ differ, so we can record the parity, which is equivalent to the sign caused by the inclusion-exclusion principle, to reduce the states.

note

102114J - Just So You Know

A non-empty substring $$$B$$$ is selected from a given string $$$A$$$ of length $$$n$$$ with equal probability among all the possible substrings. You are given the string $$$A$$$ and asked to determine which the substring $$$B$$$ is. More precisely, you need to guess out what it looks like, instead of where it is in the string.

You can claim conjectures and then the jury will prove that it is true or false. You need to find a strategy to minimize the expected number of claiming and report the expected number as an irreducible fraction.

$$$n \leq 10^6$$$, $$$\sum{n} \leq 10^7$$$, character set size $$$\Sigma \leq 100$$$.

solution

A simpler version can be found at GuessTheSubstring -- TCO 2011 Semifinal 1 500, created by misof. The harder version increases the upper bound of $$$n$$$ and requires high precision.

Before introducing our linear solution, we have to confess that it is too hard to create a dataset against solutions in time complexity $$$\mathcal{O}(n \log n)$$$. ~~Why are judging machines so fast?~~ If anyone has ideas to improve the dataset, please share in comments.

The process of guessing essentially constructs a decision tree, which is equivalent to make the Huffman encoding for all substrings of $$$A$$$. What makes things difficult is that there can be $$$\mathcal{O}(n^2)$$$ distinct substrings, so firstly we need to reduce the number of different initial states. Note that we only care about the number of the occurrences rather than what it is, and we know the maximal number is at most $$$n$$$, so if we can categorize substrings by their frequency, we may simulate the Huffman tree's construction quickly.

There are several approaches to count all the frequency of substrings in linear time. One approach is to build the suffix array through SA-IS algorithm, and then build the compressed suffix tree from this suffix array. Counting on the suffix tree is easy. However, due to the character set size, suffix automaton in space complexity $$$\mathcal{O}(n \Sigma)$$$ may fall down.

After counting, let's try to calculate the cost of building the Huffman tree in linear time. In this process, we need to merge two states with the lowest frequency into a new state repeatedly until there remains only one state. Note that the number of new state's occurrences is equal to the sum of the old two.

Denote $$$c_i$$$ as the number of states that occur $$$i$$$ times $$$(i = 1, 2, \ldots, n)$$$. For the states with the number of occurrences $$$\leq n$$$, we can speed up the consecutive same actions at once, for example, merge at most $$$\frac{c_{i}}{2}$$$ pairs of states that occur $$$i$$$ times. For the states with the number of occurrences $$$> n$$$, we can conclude the number of such states $$$< \frac{n}{2}$$$, as $$$\sum_{i}{(i \cdot c_i)}$$$ is always equal to $$$\frac{n (n + 1)}{2}$$$. We can use a first-in-first-out queue to maintain and merge these states so that this part is finished linearly.

The above explication also shows that you don't have to discuss the occurrences in cases, but only utilize the trick of a queue, or two queues, to implement. By the way, it seems this idea has occurred in 700D - Код Хаффмана на отрезке, created by GlebsHP and Endagorion, however, few people were able to take advantage of it.

note

102114K - Kaleidoscope

Count the number of non-isomorphic colored rhombic hexecontahedrons such that each face of each polyhedron are colored by one of $$$n$$$ given colors and the $$$i$$$-th color occurs on at least $$$c_i$$$ faces of each polyhedron. Report the number modulo $$$p$$$.

Two states are isomorphic if and only if one can transform into another by 3D space rotation.

In geometry, a rhombic hexecontahedron is nonconvex with $$$60$$$ golden rhombic faces with icosahedral symmetry.

$$$100$$$ large tests, $$$n \leq 60$$$, $$$1 \leq p < 2^{30}$$$.

solution

note

102114L - Lost In The Echo

Calculate the $$$n$$$-th term of the sequence A140606 in modulo $$$(10^9 + 7)$$$.

This term means the number of inequivalent expressions having $$$n$$$ distinct variables where each variable occurs exactly once, and only binary operators $$$+$$$, $$$-$$$, $$$*$$$, $$$/$$$ (and with parentheses) are permitted, which means unary minus or plus is forbidden. Two expressions are equivalent if and only if they can be simplified as the same rational expression.

$$$n = 1, 2, \ldots, 6 \times 10^4$$$.

solution

Let's first take a look at an easier version, where only binary operators $$$+$$$, $$$-$$$ are permitted. In this case, we can ignore parentheses by expanding them. Then, we can regard each operator as a sign of the variable behind it and the sign of the first variable is always positive, which can help us ignore the order of variables. In that way, we can conclude the number of different expressions is $$$(2^n - 1)$$$, as each sign must be either positive or negative, and we only need to make at least one positive sign.

Based on the above version, we can get a similar conclusion in case that only $$$*$$$, $$$/$$$ are permitted. But when at least three binary operators are permitted, things become more complex. Now let's discuss the case that $$$+$$$, $$$*$$$, $$$/$$$ (and with parentheses) are permitted, because to solve the original problem, we have to solve this task first.

In this case, parentheses cannot be expanded easily, but we can split the expression into priority levels. In each level, only $$$+$$$ is permitted or only $$$*$$$, $$$/$$$ are permitted. Each expression in a level $$$k$$$ is regarded as a single variable, or several expressions in the level $$$(k - 1)$$$ joined by operators in the level $$$k$$$.

We split expressions in this way so that we can better understand and ignore the order of an expression's successors. Also, the operators in the ancestor level cannot affect the successor levels. Hence, we can design a somewhat slow DP to count. Let $$$f(n)$$$ be the number expressions having $$$n$$$ variables such that its successors, if exists, is joined by $$$+$$$, and let $$$g(n)$$$ be the same for $$$*$$$, $$$/$$$. We have

$$$\begin{cases} f(1) = g(1) = 1 \\ f(n) = \sum_{x_1 + x_2 + \ldots + x_k = n, x_i > 0, k \geq 2}{\frac{1}{k!} {n \choose x_1, x_2, \ldots, x_k} g(x_1) g(x_2) \ldots g(x_k)}, & n \geq 2 \\ g(n) = \sum_{x_1 + x_2 + \ldots + x_k = n, x_i > 0, k \geq 2}{\frac{2^k - 1}{k!} {n \choose x_1, x_2, \ldots, x_k} f(x_1) f(x_2) \ldots f(x_k)}, & n \geq 2 \end{cases}\text{,}$$$

where $$${n \choose x_1, x_2, \ldots, x_k}$$$ means $$$\frac{n!}{x_1! x_2! \ldots x_k!}$$$.

The transition is to enumerate $$$k$$$ unordered successors with their number of variables and relabel these $$$n$$$ variables, however, it's easy to understand but not easy to implement, so let's take a look at their exponential generating function.

Define that $$$F(x) = \sum_{n \geq 1}{\frac{f(n) x^n}{n!}}$$$, $$$G(x) = \sum_{n \geq 1}{\frac{g(n) x^n}{n!}}$$$. We have

$$$\begin{cases} F(x) = x + \sum_{k \geq 2}{\frac{G^k(x)}{k!}} & = x + \exp(G(x)) - G(x) - 1 \\ G(x) = x + \sum_{k \geq 2}{\frac{(2^k - 1) F^k(x)}{k!}} & = x + \exp(2 F(x)) - \exp(F(x)) - F(x) \end{cases}\text{,}$$$

where $$$\exp(A) = \sum_{k \geq 0}{\frac{A^k}{k!}}$$$.

We can calculte $$$\exp(A)$$$ based on a fact that $$$B = \exp(A) \Leftrightarrow B' = A' \exp(A) = A' B$$$. Together with that, we can desgin a DP to maintain the first $$$n$$$ terms of $$$F(x)$$$, $$$G(x)$$$, $$$\exp(F(x))$$$, $$$\exp(2 F(x))$$$, $$$\exp(G(x))$$$, in time complexity $$$\mathcal{O}(n^2)$$$. But if you are familiar with divide and conquer optimization with fast convolution in modulo integers, you can speed the process into $$$\mathcal{O}(n \log^2 n)$$$. The optimization is mentioned in the last paragraph of the tutorial for 438E - Ребенок и двоичное дерево, created by vfleaking. By the way, I've tried to apply the Newton-Iteration like approach, but I failed. If anyone comes up with a better solution, please share in comments.

Finally, let's focus on the original problem. The minus operator in a level can influence operators in the successor levels which may lead to duplicated counting, for example, $$$a + b / (c - d)$$$ and $$$a - b / (d - c)$$$ are equivalent, but if we keep the aforementioned counting approach, we will count them twice. It's not difficult to observe that if we regard an expression and its opposite (i.e. the expression such that their sum is zero) are equivalent, the counting will work. The transition is like

$$$\begin{cases} f(1) = g(1) = 1 \\ f(n) = \sum_{x_1 + x_2 + \ldots + x_k = n, x_i > 0, k \geq 2}{\frac{2^{k - 1}}{k!} {n \choose x_1, x_2, \ldots, x_k} g(x_1) g(x_2) \ldots g(x_k)}, & n \geq 2 \\ g(n) = \sum_{x_1 + x_2 + \ldots + x_k = n, x_i > 0, k \geq 2}{\frac{2^k - 1}{k!} {n \choose x_1, x_2, \ldots, x_k} f(x_1) f(x_2) \ldots f(x_k)}, & n \geq 2 \end{cases}\text{.}$$$

But there still exists a corner case that an expression's opposite may be invalid. In that case, the minus operator must not be involved in the expression, and how to count that has already been discussed. Therefore, the answer is twice the number of expressions that ignore the sign, minus the number of expressions that ignore the minus operator.

note

Which problem do you prefer or hate most? Feel free to share your thoughts in comments! ^_^

Rev.	By	When	Δ	Comment
en8	skywalkert	2019-06-18 16:00:18	5709	add editorial for L; it is finally finished lol
en7	skywalkert	2019-06-18 14:00:45	3893	fix typos; add missing details
en6	skywalkert	2019-06-17 21:44:53	8785	add editorial for J and K; add data range and other details
en5	skywalkert	2019-06-17 17:38:57	681	add missing part for I
en4	skywalkert	2019-06-17 17:26:10	5147	add editorial for G to I
en3	skywalkert	2019-06-17 14:56:28	4828	add editorial for D to F; fill more details in previous sections
en2	skywalkert	2019-06-17 12:43:20	5037	add editorial for A to C
en1	skywalkert	2019-06-17 11:10:10	468	Initial revision (published)

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	jiangly	3578
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	tourist	3565
8	maroonrk	3531
9	Radewoosh	3521
10	Um_nik	3482

#	User	Contrib.
1	maomao90	174
2	awoo	165
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	orz	146
8	SecondThread	146
10	pajenegod	145

History