Nice trick involving sparse matrix exponentiation (kind-of)

№	Пользователь	Рейтинг
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

№	Пользователь	Вклад
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	-is-this-fft-	150
7	SecondThread	150
9	orz	146
10	pajenegod	145

Recently, I’ve figured out a way to improve matrix exponentiation-like algorithms for sparse matrices. I’m pretty sure that some of you might find this being elementary linear algebra, but I’m pretty proud of my discovery, so I’m going to write a blog post about it. I've marked some TODOs and ideas in square brackets, in case you want to help me enhance this.

Prerequisites

Cayley-Hamilton theorem, Berlekamp-Massey algorithm, fast linear recurrence solvers.

Terminology

$$$M$$$ is an $$$n \times n$$$ “sparse” matrix, with $$$m$$$ entries different than $$$0$$$.
$$$k$$$ is the exponent.

Complexities will be written in terms of $$$n, m, k$$$.

Cayley-Hamilton theorem

One of the most astonishing theorems in my opinion, what the theorem says is that for any $$$n \times n$$$ matrix, there exists some coefficients $$$c_i$$$ such that.

$$$M^n = \sum_{i = 1}^n c_i M^{n - i}$$$

$$$c$$$ is often called the characteristic polynomial of $$$M$$$.

This theorem essentially says that the powers of any $$$n \times n$$$ matrix follow a linear recurrence of order at most $$$n$$$ (to understand why that is, take the above equation and multiply both sides with $$$M^x$$$ for any $$$x \geq 0$$$).

Computing quadratic forms $$$\sum{M^k_{i, j} a_i b_j}$$$

Doing some little algebra, it is easy to see that $$$\sum{Q_{i, j} a_i b_j}$$$ can be written equivalently as $$$a^T Q b$$$, for any matrix $$$Q$$$. This is called a quadratic form.

Many problems involving matrix exponentiation in which you have to output one number essentially boil down to these quadratic forms. For example, computing the $$$k$$$-th fibonacci number essentially asks us to compute $$$a^T M^k b$$$, where $$$a = b = [1, 0]^T$$$, and $$$M$$$ is the fibonacci recurrence matrix.

Note that, due to the Cayley-Hamilton theorem, after doing some algebra it can be seen that $$$s_k = a^T M^k b$$$ follows a linear recurrence with the recurrence coefficients equal to $$$c$$$. Therefore, if we compute $$$s_1, …, s_{2n}$$$, we can use Berlekamp-Massey algorithm to deduce $$$c$$$, and then use a fast linear recurrence algorithm to compute $$$X^k mod c$$$ (polynomial modulo) in $$$O(n^2 \log{k})$$$ or $$$O(n \log{n} \log{k})$$$, and then find out $$$s_k$$$ [there was a tutorial on Codeforces for this, perhaps you can share the link in the comments].

However, to do that, we have to compute $$$s_0, …, s_{2n}$$$. In order to do that, remember that $$$s_i = a^T M^i b$$$. This means that $$$s_i = a^T M (M^{i - 1} b)$$$. If we compute $$$M^i b$$$ for $$$i = 1, 2, …, 2n$$$ (using iterative matrix vector products), we can compute $$$s_1, …, s_{2n}$$$ in time $$$O(m n)$$$ (it is essential to understand that a matrix-vector product $$$Mv$$$ can be computed in complexity $$$O(m)$$$).

Summing up, we can do the following:

Compute $$$s_0, …, s_{2n}$$$ in $$$O(m n)$$$
Compute $$$c$$$ in $$$O(n^2)$$$ using Berlekamp-Massey
Compute $$$X^n mod c$$$ in $$$O(n^2 \log{k})$$$ or $$$O(n \log{n}\log{k})$$$
Compute $$$s_k$$$ in $$$O(n)$$$

Total complexity is $$$O(mn + n^2 + n^2 \log{k} + n)$$$ or $$$O(mn + n^2 + n \log{n} \log{k} + n)$$$.

Computing matrix-vector products $$$M^k v$$$.

Let’s go even further and compute matrix-vector products. This is the same as computing $$$e^{(j)} M^k v$$$ for all $$$j$$$. Here $$$e^{(j)}$$$ is the $$$j$$$-th element of the canonical base ($$$e^{(j)}_k = [k = j]$$$). The key thing to notice here is that the characteristic polynomial $$$c$$$ is the same for all $$$j$$$, so all elements follow the same recurrence. Also, products $$$M^i v$$$ are computed in the previous procedure, so we can just use them to compute all initial terms of the recurrence!

This essentially means that you can just take any non-zero vector $$$w$$$ (*), compute the initial $$$2n$$$ terms $$$M^i v$$$ that you need, and then do Berlekamp-Massey on $$$w^T M^i v$$$, and you are done! The only thing that changes is the last step, which requires $$$O(n^2)$$$ work now.

*There are some good and bad choices for such a vector. In particular, you would want to choose a vector $$$w$$$ such that the sequence $$$w^T A^i v$$$ has order of recurrence equal to $$$n$$$. I think it's fine to just pick random vectors until this works?

Total complexity is $$$O(mn + n^2 + n^2 \log{k} + n^2)$$$ or $$$O(mn + n^2 + n \log{n} \log{k} + n^2)$$$.

Computing $$$M^k$$$?

I’m not sure if this is possible, but I’m eager to hear your ideas!

Example problems

Given a directed graph $$$G$$$ with $$$n$$$ vertices and $$$m$$$ edges, compute the number of paths of length exactly $$$k$$$. Optionally, compute the number of paths of length $$$k$$$ starting in each of the $$$n$$$ vertices. $$$n \leq 1000, m \leq 10000, k \leq 10^9$$$. [any links to this problem?]

Given an undirected tree with $$$n$$$ vertices, find the number of paths of length $$$k$$$ from a given root vertex $$$r$$$ to all other vertices. [Link]

Given a planar graph $$$G$$$ with $$$n$$$ vertices, find the expected number of steps until you reach node $$$n$$$, starting from node $$$1$$$. At every step you choose an arbitrary neighbor of the current vertex. [Link]

[other problems?]

Sample code

Note that the sample code does not compute matrix-vector products in $$$O(m)$$$. While it kind of breaks the purpose of this blog post, it is more a test of correctness than an actual solution to a problem.

Code

#include <bits/stdc++.h>

using namespace std;

const int MOD = 1e9 + 7;

struct ModInt {
  int x;
  ModInt(int x = 0) : x(x + (x < 0) * MOD - (x >= MOD) * MOD) {}
  ModInt operator+(ModInt o) { return x + o.x; }
  ModInt operator-(ModInt o) { return x - o.x; }
  ModInt operator*(ModInt o) { return 1LL * x * o.x % MOD; }
  ModInt operator/(ModInt b) { return *this * b.inv(); }
  ModInt inv() { return pow(MOD - 2); }
  ModInt pow(long long e) {
    if (!e) return ModInt(1);
    ModInt r = pow(e / 2); r = r * r;
    return e % 2 ? *this * r : r;
  }
  bool operator==(ModInt o) { return x == o.x; }
};


vector<ModInt> BerlekampMassey(vector<ModInt> s) {
  int n = s.size();
  vector<ModInt> C(n), B(n);
  C[0] = B[0] = 1;

  auto b = C[0]; int L = 0;
  for (int i = 0, m = 1; i < n; ++i) {
    /// Calculate discrepancy
    auto d = s[i];
    for (int j = 1; j <= L; ++j)
      d = d + C[j] * s[i - j];
    if (d == 0) { ++m; continue; }
    /// C -= d / b * B * X^m
    auto T = C; auto coef = d / b;
    for (int j = m; j < n; ++j)
      C[j] = C[j] - coef * B[j - m];
    if (2 * L > i) { ++m; continue; }
    L = i + 1 - L; B = T; b = d; m = 1;
  }
  C.resize(L + 1); C.erase(C.begin());
  for (auto& x : C) x = x * (-1);
  return C;
}

using Poly = vector<ModInt>;

Poly LinearRec(Poly trans, int k) {
  int n = trans.size(); // assert(n <= (int)first.size());
  Poly r(n + 1), b(r); r[0] = b[1] = 1;
  auto ans = b[0];

  auto combine = [&](Poly a, Poly b) { // a * b mod trans
    Poly res(n * 2 + 1);
    for (int i = 0; i <= n; ++i)
      for (int j = 0; j <= n; ++j)
        res[i + j] = res[i + j] + a[i] * b[j];
    for (int i = 2 * n; i > n; --i)
      for (int j = 0; j < n; ++j)
        res[i - 1 - j] = res[i - 1 - j] + res[i] * trans[j];
    res.resize(n + 1);
    return res;
  };
  // Consider caching the powers for multiple queries
  for (++k; k; k /= 2) {
    if (k % 2) r = combine(r, b);
    b = combine(b, b);
  }
  return r;
}

vector<ModInt> Power(vector<Poly> M, int k, vector<ModInt> v) {
  int n = M.size();
  vector<Poly> first(2 * n, Poly(n));
  first[0] = v;
  vector<ModInt> s(2 * n);
  s[0] = v[0];

  // Step 1 (optimizable to O(m * n))
  for (int i = 1; i < 2 * n; ++i) {
    for (int j = 0; j < n; ++j)
      for (int k = 0; k < n; ++k)
        first[i][k] = first[i][k] + M[j][k] * first[i - 1][j];
    s[i] = first[i][0];
  }

  // Step 2 (O(n^2))
  auto trans = BerlekampMassey(s);
  int rec = trans.size();

  // Step 3 (O(n^2 log k))
  auto c = LinearRec(trans, k);

  // Step 4 (O(n^2))
  vector<ModInt> ans(n);
  for (int i = 0; i < n; ++i) 
    for (int j = 0; j < rec; ++j)
      ans[i] = ans[i] + c[j + 1] * first[j][i];

  return ans;
}

int main() {
  vector<Poly> M(2, Poly(2));
  M[0][0] = M[1][0] = M[0][1] = 1;
  Poly v(2, 0); v[1] = 1;

  for (int k = 0; k < 100; ++k) {
    auto ans = Power(M, k, v);
    cout << ans[0].x << " " << ans[1].x << endl;
  }
  
  return 0;
}

Комментарии (8)

Написать комментарий?

Golovanov399

3 года назад, # |

← Rev. 2 →

+18

Here is one of such problems: link

Another one is problem E here

→ Ответить

ftiasch

+33

I guess this is sometimes called black-box algebra.

bicsi

3 года назад, # ^ |

At first I thought this was a pun, but apparently there is such a thing! XD

300iq

+25

I believe that the first time in CP this technique appeared at the ITMO Petrozavodsk contest, in the problem "Fresh Matrix" from izban.

bciobanu

Block Wiedemann algorithm

ko_osaga

+20

The whole thing is known as Black-box linear algebra and was already featured in various contest. I learned it from here. But still you deserve some bragging rights :)

Probably the most succinct way to put this is the following: Given a matrix $$$M$$$, take a random row vector $$$a, b$$$ and compute $$$a M^k b^T$$$ for all $$$0 \le k \le n$$$. If you run a Berlekamp-Massey algorithm, then w.h.p you get a minimal polynomial of $$$M$$$.

Then the examples here become obvious, as long as you don't multiply a matrix by itself.

If you do some perturbation (take a random diagonal matrix and multiply) to make the matrix random, the constant term of minimal polynomial w.h.p coincides with those of the characteristic polynomial aka Determinant. Note that the determinant of original matrix can be recovered, because diagonal matrix have trivial determinant and $$$det(AB) = det(A)det(B)$$$. One application of this is to compute the number spanning tree of a graph in $$$O(nm)$$$ time.

+10

That’s a really nice article, especially as it also mentions that you can solve linear systems using a similar idea. I’ll look more into this. Also, I might think about how to extend to other cases, like computing traces, for example. Thank you!

’the constant term coincides with the determinant’ — maybe I understood it wrong, but the article doesn’t seem to assume that. It assumes proportionality is conserved (therefore, it computes $$$c(A \Delta, 0) / c(\Delta, 0)$$$, where $$$\Delta$$$ is a random diagonal matrix, right?

Yeah, it's my mistake. I updated the comment. Thank you!