Hello everyone!
While I am solving this problem 1967C - Fenwick Tree, I read Elegia matrix solution and finally came up with my own solution that is based on his approach, using the matrix of linear transformation. Throughout the solution is tons of "heavy-math" knowledges and assumptions that is considered to be hard to see. Motivated enough, I have my heart set on writing an entire blog here with an ambition to prove every details in my solution.
Let's first quote the problem here
Problem Statement
Let $$$\operatorname{lowbit}(x)$$$ denote the value of the lowest binary bit of $$$x$$$, e.g. $$$\operatorname{lowbit}(12)=4, \operatorname{lowbit}(8)=8.$$$
For an array $$$a$$$ of length $$$n$$$, if an array $$$s$$$ of length $$$n$$$ satisfies $$$s_k=\left(\sum\limits_{i=k-\operatorname{lowbit}(k)+1}^{k}a_i\right)\bmod 998\,244\,353$$$ for all $$$k$$$, then $$$s$$$ is called the Fenwick Tree of $$$a$$$. Let's denote it as $$$s=f(a).$$$
For a positive integer $$$k$$$ and an array $$$a$$$, $$$f_k(a)$$$ is defined as follows:
$$$ f^k(a)= \begin{cases} f(a)&\textrm{if }k=1\\ f(f^{k-1}(a))&\textrm{otherwise.}\\ \end{cases} $$$You are given an array $$$b$$$ of length $$$n$$$ and a positive integer $$$k$$$. Find an array $$$a$$$ that satisfies $$$0\leq a_i <998244353$$$ and $$$f_k(a)=b$$$. It can be proved that an answer always exists. If there are multiple possible answers, you may print any of them.
Solution
The definition of $$$s_k$$$ makes us think of the linear map $$$T: \mathbb{R_n}\rightarrow \mathbb{R_n}$$$ which map a vetor array $$$a$$$ to the Fenwick Tree of $$$a$$$, in other words $$$T(a)=f(a)$$$. Note that $$$R_n$$$ here is the set of all matrices with size $$$n\times 1$$$ and if $$$a \in \mathbb{R_n}$$$ then
$$$ a = \begin{pmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix}. $$$From the definition of $$$s_k$$$ we can see that the matrix $$$T$$$ of the linear map is
$$$ T = (t_{ij})_{1\leq i, j \leq n} $$$and satisfy
$$$ t_{ij}= \begin{cases} 1&\textrm{if } i-\operatorname{lowbit}(i)+1\leq j\leq i\\ 0&\textrm{otherwise.}\\ \end{cases},\forall 1\leq i, j \leq n $$$ ExampleFor example, with $$$n=10$$$, we have
$$$ T = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 \end{pmatrix}. $$$ Because $$$T$$$ is a lower triangular matrix, we have $$$\det T = 1$$$ for all $$$n\geq 1$$$ and the characteristicpolynomial of $$$T-I$$$ is
$$$ \det(T-I-\lambda I) = (-\lambda)^n. $$$So all the eigenvalues of $$$T-I$$$ are $$$0$$$, $$$T-I$$$ is nilpotent and according to the Cayley-Hamilton theorem, we have
$$$ (T-I)^n=0. $$$However, all of these above observations are just not enough to solve this problem, as will be seen later on. Indeed, we need to find the smallest integer $$$m$$$ such that
$$$ (T-I)^m=0. $$$We claim that $$$m = \lfloor\log_2(n) \rfloor +1$$$.
Warning: the solution requires many advanced mathematical techniques and ... long, but it's the best way I can think of.
Proof of m (using block matrix and induction)Denote $$$T_n$$$ to be the linear transformation matrix with size $$$n$$$. First, we will prove that $$$m = \lfloor\log_2(n) \rfloor +1$$$ for all $$$n=2^t (t\geq 1)$$$. We will prove this by induction on $$$t$$$.
- With $$$t=1$$$ or $$$n=2$$$, we can see that
$$$ T_2 - I_2 = \begin{pmatrix} 0 & 0 \\ 1 & 0 \end{pmatrix}. $$$And
$$$ (T_2 - I_2)^2 = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}. $$$So $$$k=2=\lfloor\log_2(2) \rfloor +1$$$
- Suppose we have $$$t=m$$$ is true with $$$m\geq 1$$$, set $$$n = 2^m$$$. We will prove that $$$t=m+1$$$ is also true, or $$$2n=2^{m+1}$$$ is true.
By induction, because $$$\lfloor\log_2(2^{m}) \rfloor +1 = m + 1$$$, so we have
$$$ (T_n-I_n)^m \neq O_n \text{ and } (T_n-I_n)^{m+1} = O_n. $$$Here $$$O_n$$$ denotes the $$$n\times n$$$ matrix with all entries equal zero. We need to prove that
$$$ (T_{2n}-I_{2n})^{m+1} \neq O_{2n} \text{ and } (T_{2n}-I_{2n})^{m+2} = O_{2n}. $$$Observer from small cases, we can see that with $$$n$$$ of the form $$$2^t$$$ then
$$$ T_{2n} = \begin{pmatrix} T_n & O_n \\ B_n & T_n \end{pmatrix} $$$With
$$$ B_n = \begin{pmatrix} 0 & 0 & \ldots & 0 \\ 0 & 0 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \ldots & 1 \\ \end{pmatrix} \in \mathbb{M_{n\times n}(\mathbb{R})}. $$$For the sake of brevity, I will exclude the proof of $$$B_n, T_{2n}$$$ having the aforementioned forms. Since $$$T_n-I$$$ has the $$$n$$$-th column with all entries being zero, we have $$$(T_n-I_n)B_n = O_n$$$. So, performing block matrix multiplication, we have
$$$ (T_{2n} - I_{2n})^{2} = \begin{pmatrix} (T_n-I_n)^{2} & O_n \\ B_n(T_n-I_n) +(T_n-I_n)B_n & (T_n - I_n)^{2} \end{pmatrix} = \begin{pmatrix} (T_n-I_n)^{2} & O_n \\ B_n(T_n-I_n) & (T_n - I_n)^{2} \end{pmatrix}. $$$Generally we have
$$$ (T_{2n} - I_{2n})^{m+1} = \begin{pmatrix} (T_n-I_n)^{m+1} & O_n \\ B_n(T_n-I_n)^m & (T_n - I_n)^{m+1} \end{pmatrix} = \begin{pmatrix} O_n & O_n \\ B_n(T_n-I_n)^m & O_n \end{pmatrix} $$$To prove $$$B_n(T_n-I_n)^m \neq O_n$$$, observe that since $$$T_n-I_n$$$ matrix have all nonegative entries, so does $$$(T_n-I_n)^{m}$$$. On the other hand if there exists matrix $$$X$$$ such that $$$B_nX = O_n$$$ then $$$X$$$ must have the sum of every column equal zero.
Since $$$(T_n-I_n)^{m}$$$ have only nonegative entries and $$$(T_n-I_n)^{m}\neq O_n$$$ then it must have at least one column with positive sum. Hence, $$$B_n(T_n-I_n)^m \neq O_n$$$. So $$$(T_{2n} - I_{2n})^{m+1}\neq O_{2n}$$$. Finally
$$$ (T_{2n} - I_{2n})^{m+2} = \begin{pmatrix} (T_n-I_n)^{m+2} & O_n \\ B_n(T_n-I_n)^{m+1} & (T_n - I_n)^{m+2} \end{pmatrix} = \begin{pmatrix} O_n & O_n \\ O_n & O_n \end{pmatrix} = O_{2n}. $$$Induction complete. So far, we have prove $$$m = \lfloor\log_2(n) \rfloor +1$$$ for all $$$n = 2^t$$$. The remaining part is easy, we will once again use induction with $$$2^{t-1}\leq n < 2^t$$$, we then have $$$m = t$$$.
The case $$$n<2^1$$$ is trivial
Suppose $$$n< 2^t$$$ is true for $$$t\geq 1$$$, we will prove $$$2^{t}\leq n<2^{t+1}$$$ with $$$m=t+1$$$ is also true. For any $$$0 \leq k <2^t$$$, set $$$n=2^t$$$, and using the induction result with $$$k \leq 2^t$$$ we have
$$$ (T_k-I_k)^{t} = O_k. $$$On the other hand, since $$$n=2^t$$$, we have
$$$ (T_n-I_n)^{t} \neq O_n \text{ and } (T_n-I_n)^{t+1} = O_n. $$$ $$$ T_{n+k} = \begin{pmatrix} T_n & O_{n\times k} \\ O_{k\times n} & T_k \\ \end{pmatrix} $$$Thus
$$$ (T_{n+k}- I_{n+k})^{t} = \begin{pmatrix} (T_n-I_n)^{t} & O_{n\times k} \\ O_{k\times n} & (T_k-I_k)^{t} \\ \end{pmatrix} = \begin{pmatrix} (T_n-I_n)^{t} & O_{n\times k} \\ O_{k\times n} & O_k \\ \end{pmatrix} \neq O_{n+k}. $$$And
$$$ (T_{n+k}- I_{n+k})^{t+1} = \begin{pmatrix} (T_n-I_n)^{t+1} & O_{n\times k} \\ O_{k\times n} & (T_k-I_k)^{t+1} \\ \end{pmatrix} = \begin{pmatrix} O_n & O_{n\times k} \\ O_{k\times n} & O_m \\ \end{pmatrix} = O_{k+n}. $$$Induction complete. So for all $$$n\geq 1$$$ we have $$$m=\lfloor\log_2(n) \rfloor +1$$$, as desired.
After having proven the bound for $$$m$$$, since the restriction on $$$n$$$ of the problem is $$$1\leq n \leq 2\cdot 10^5$$$, we then have
$$$ m = \lfloor\log_2(n) \rfloor +1\leq \lfloor\log_2(2\cdot 10^5) \rfloor +1 = 19. $$$It means for all $$$1\leq n \leq 2\cdot 10^5$$$, it's ensured that
$$$ (T_n-I_n)^{19}=O_n. $$$Now, from the definition of $$$T_n$$$, we know that if $$$f_k(a) = b$$$ then $$$(T_n)^k\cdot a = b$$$ or $$$a = (T_n^{-1})^k \cdot b$$$. Thus, to find $$$a$$$, we only need to compute $$$(T_n^{-1})^k \cdot b$$$ efficiently.
Achieving this requires a bit of Polynomial Modular Arithmetic skills and of course, we will use modulo $$$(x-1)^{19}$$$ here. Let's assume that for a polynomial $$$Q(x)$$$ with $$$\deg Q\leq 18$$$
$$$ x^{-k}\equiv Q(x) \mod (x-1)^{19}. $$$If and only if
$$$ 1 \equiv x^kQ(x) \mod (x-1)^{19}. $$$Suppose $$$Q(x) = c_{18}x^{18}+\ldots + c_1x+c_0$$$.Then it can be easily seen that
$$$ (T_n)^{-k} = c_{18}T_n^{18}+\ldots +c_1T_n + c_0I_n. $$$The only problem is to know the formula of $$$c_i$$$ that is dependent of $$$k$$$ as well and I got stuck while doing this. Then I came up with another fresh idea. Instead of writing $$$T_n^{-k}$$$ as a linear combination of $$$(I_n, T_n, \ldots, T_n^{18})$$$, why not writing it as a linear combination of $$$(I_n, T_n - I_n, \ldots, (T_n-I_n)^{18})$$$. We have one secret weapon that can compute the linear combination's coeffcients directly. That's the well-known Taylor Series. Using the Taylor series for function $$$x^{-k}$$$ at $$$x_0 = 1$$$, we deduce that
$$$ \begin{aligned} x^{-k}&= 1 + -\displaystyle\frac{k}{1!}(x-1) + \frac{k(k+1)}{2!}(x-1)^2+\ldots\\ &\equiv 1 + -\frac{k}{1!}(x-1) + \ldots +(-1)^{18}\frac{k(k+1)\ldots(k+19-2)}{18!}(x-1)^{18} \mod (x-1)^{19} \end{aligned} $$$Thus
$$$ T_n^{-k} = I_n + -\displaystyle\frac{k}{1!}(T_n-I_n) + \ldots +(-1)^{18}\frac{k(k+1)\ldots(k+19-2)}{18!}(T_n-I_n)^{18}. $$$Please note that we can use Taylor Series for $$$x_0=1$$$ here and not $$$x_0=0$$$ because $$$x^{-k}$$$ is undefined at $$$x=0$$$. Now that we have the explicit formula of $$$c_i$$$, we will still have to compute it modulo $$$998244353$$$ because if $$$k$$$ get pretty large, the last coeffient can be up to $$$\displaystyle\frac{(10^9)^{18}}{18!}$$$. One more question arises: How can we handle the number $$$1, 2,\ldots, 18$$$ under the denominator of these coefficents? Since $$$998244353$$$ is indeed a prime number, then we can precalculate the inversion of them.
// inverse[i] is the number x that x*i equiv 1 mod 998244353
vector<ll> inverse = { 0 , 1, 499122177, 332748118, 748683265, 598946612, 166374059, 855638017, 873463809, 443664157, 299473306, 272248460, 582309206, 460728163, 926941185, 865145106, 935854081, 939524097, 720954255 };
Finally we can multiply $$$T_n-I_n$$$ with any vector $$$b$$$ in $$$O(n\log n)$$$ instead of $$$O(n^2)$$$ in a standard multiplication of a $$$n\times n$$$ matrix with a $$$n\times 1$$$ one. Generally, we can multiply $$$(T_n-I_n)^k$$$ with any vector $$$b$$$ in $$$O(kn\log n)$$$.
Fast proof of thisObserve that $$$T_n-I_n$$$ contains many zeros and only a few ones, so the number of operations is not very large. Indeed, we can count directly that the number of operations need is the number of ones in $$$T_n - I_n$$$ and is
$$$ \displaystyle\sum_{i=1}^n \operatorname{lowbit}(i) - n = \left\lfloor\frac{n}{2}\right\rfloor+ 2\left\lfloor\frac{n}{4}\right\rfloor + 4\left\lfloor\frac{n}{8}\right\rfloor+\ldots = \sum_{i=1}^{+\infty}2^{i-1}\left\lfloor\frac{n}{2^i}\right\rfloor \approx n\log n. $$$ Hence
$$$ T_n^{-k}b = b + -\displaystyle\frac{k}{1!}(T_n-I_n)b + \ldots +(-1)^{18}\frac{k(k+1)\ldots(k+19-2)}{18!}(T_n-I_n)^{18}b. $$$Can be done in $$$O((1+2+\ldots +18)n\log n)=O(n\log n)$$$. And we're done. Here's my code implementation.
Code#include <iostream>
#include <vector>
#include <cmath>
#include <iomanip>
#include <algorithm>
#include <string>
using namespace std;
using ll = long long;
using ull = unsigned long long;
using ld = long double;
#define modulo 998244353
void init();
void solve();
inline ll lowbit(ll i)
{
return i & (-i);
}
vector<ll> T_I(vector<ll> a)
{
ll n = a.size();
vector<ll> b(n, 0);
for (ll i = 0; i < n; i++)
{
for (ll j = i + 1 - lowbit(i + 1) + 1; j <= i; j++)
{
b[i] = (b[i] + a[j - 1]) % modulo;
}
}
return b;
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL); cout.tie(NULL);
ull t = 1;
cin >> t; //number of test cases
init();
while (t--)
{
solve();
}
return 0;
}
void init()
{
/* Write your init code here */
}
void solve()
{
ll n, k;
cin >> n >> k;
vector<ll> b;
for (ll i = 0; i < n; i++)
{
ll x;
cin >> x;
b.push_back(x);
}
// inverse[i] is the number x that x*i equiv 1 mod 998244353
vector<ll> inverse = { 0, 1, 499122177, 332748118, 748683265, 598946612, 166374059, 855638017,
873463809, 443664157, 299473306, 272248460, 582309206, 460728163, 926941185,
865145106, 935854081, 939524097, 720954255 };
vector<ll> c(19);
c[0] = 1;
for (ll i = 1; i < 19; i++)
{
c[i] = (((- c[i - 1] * (k + i - 1)) % modulo)* inverse[i]) % modulo;
}
vector<ll> a(n, 0);
vector<ll> d(b);
for (ll i = 0; i < 19; i++)
{
for (ll j = 0; j < n; j++)
{
a[j] = (a[j] + (c[i] * d[j]) % modulo) % modulo;
}
d = T_I(d);
}
for (ll i : a)
{
if (i < 0)
{
cout << (i % modulo) + modulo << " ";
}
else
{
cout << i % modulo << " ";
}
}
cout << "\n";
}
It's not the end, I am still writing and updating this topic.
who cares
I do.
me too
Interesting to read though, nice effort.