Fenwick Tree generalization using Centroids

Revision en1, by Bashca, 2020-09-30 05:46:11

Hello, Codeforces community!

In this opportunity I wanted to show you a trick that I discovered a some time ago. The basic idea is to generalize the Fenwick Tree in order to make it work on trees.

I am very grateful to racsosabe, for helping me in the revision and correction of this blog.

A few days ago I read this dolphingarlic's blog, and I found it pertinent to talk about this trick. I've solved some problems using this idea but I don't know if it is not known or I just didn't know the name. I'll start by making some proofs and checking a few examples.

Definition 1 (Centroid):

A centroid of a tree is a node $$$x$$$ that minimizes the maximum size among all the trees in the forest generated by eliminating $$$x$$$.

A significant theorem tells us that a centroid is that node that, after its elimination, makes the maximum tree size less than or equal to $$$\frac{n}{2}$$$. We can find some simple proofs here, and a way to find it here.

Centroid Tree

Centroid Tree is the name given to the tree found when computing the well-known centroid decomposition.

There are many useful tutorials on how to build and use the Centroid Tree. Some really cool tutorials: 1, 2, 3, 4.

Lemma 2:

  • The height of a Centroid Tree is at most $$$\log_{2}n + 1$$$.

Remark: Note that each tree in the forest when removing a centroid has at most size $$$\frac{n}{2}$$$.

Lemma 3:

  • Given two nodes $$$u$$$ and $$$v$$$, the LCA of $$$u$$$ and $$$v$$$ in the Centroid Tree belongs to the path between $$$u$$$ and $$$v$$$ in the original tree.

Remark: Note that a centroid separates all the paths between nodes in different subtrees, and when we choose the LCA between $$$u$$$ and $$$v$$$ as centroid, it is the first time that $$$u$$$ and $$$v$$$ are in different subtrees.

Corollary 4:

  • Let $$$x$$$ be an arbitrary node, the $$$n$$$ paths from $$$x$$$ to any node are encoded in at most $$$\log_{2}n + 1$$$ nodes.

Lemma 1 and 2 tells us that the LCAs of a node $$$x$$$ in the Centroid Tree are part of every path that begins at $$$x$$$. Furthermore, in the remark of lemma 2 we can see that the LCA of $$$u$$$ and $$$v$$$ is the only common ancestor of both nodes in the Centroid Tree that belongs to the path between $$$u$$$ and $$$v$$$.

Basic Implementation

const int maxn = 1e5 + 10;
vector<int> g[maxn];
int sz[maxn], pi[maxn];
bool block[maxn];
int n;

int centroid(int v, int p, int n) {
  sz[v] = 1;
  int mx=0, cen=0;
  for (auto u : g[v]) if (u!=p && !block[u]) {
      cen ^= centroid(u, v, n);
      sz[v] += sz[u], mx=max(mx, sz[u]);
  }
  mx = max(mx, n-sz[v]);
  if (2*mx <= n) pi[cen=v]=p;
  return cen;
}

void decompose(int x, int p=-1, int m=n) {
  int cen = centroid(x, -1, m);
  if (~pi[cen]) sz[pi[cen]]=m-sz[cen];
  pi[cen]=p; block[cen]=1;
  for (auto v : g[cen]) if (!block[v]) {
      decompose(v, cen, sz[v]);
  }
}

Descendant-Ancestor Structure

Definition 5 (Centroid Path):

We define the centroid path of a node as to the set $$$C(x)$$$ of ancestors of $$$x$$$ (including itself) in the centroid tree.

Definition 6 (Subtree Set):

We define the Subtree Set with respect to $$$r$$$ of a node $$$x$$$ as to the set $$$S_{r} (x)$$$ of descendants of $$$x$$$ when $$$r$$$ is root.

Definition 7 (Ancestors Set):

We define the Ancestor Set with respect to $$$r$$$ of a node $$$x$$$ as to the set $$$A_{r} (x)$$$, of nodes on the path from $$$r$$$ to $$$x$$$, including itself.

Lemma 8:

  • Given a node $$$x$$$, the set $$$S_{r} (x) \cap C (x)$$$ encodes all paths between $$$x$$$ and $$$S_{r} (x)$$$.

Remark: Note that every node on the path from $$$x$$$ to its descendants is a descendant of $$$x$$$. Therefore, the centroids that encode the paths are also descendants.

Lemma 9:

  • Given a node $$$x$$$, the set $$$A_{r} (x) \cap C (x)$$$ encodes all paths between $$$x$$$ and $$$A_{r} (x)$$$.

Remark: Analogous to the lemma above.

Corollary 10:

  • Given two nodes $$$u$$$, $$$v$$$ and $$$D(u, v) = (S_{r}(u) \cap C(u)) \cap (A_{r}(v) \cap C(v))$$$, then, $$$\vert D(u, v) \vert \le 1$$$.

proof: If $$$v$$$ is not a descendant of $$$u$$$, then the cardinality is $$$0$$$, otherwise, there is a unique common ancestor that is necessarily a descendant of $$$u$$$ and ancestor of $$$v$$$.

It should be noted that the last corollary tells us that we can exchange information without repetition between the descendants and ancestors through the nodes in the centroid path.

Is in subtree query

The method to check that a node is in the subtree of another one is well known, you can see the following comment.

Basic implementation:

int T = 0;
void dfs(int x, int p) {
  lo[x] = T++;
  for (int v : g[x]) if (v^p) {
    dfs(v, x);
  }
  hi[x] = T++;
}

//check if y is in the subtree of x
bool is_in_subtree(int x, int y) {
  return lo[x] <= lo[y] && hi[y] <= hi[x];
}

How to update and query values to a node?

int ft[maxn];
void update(int x, int add) {
  for (int ll=lo[x], rr=hi[x];~x; x=pi[x]) //for to iterate over the "Centroid Path"
    if (ll <= lo[x] && hi[x] <= rr) { //update descendants
      ft[x] += add;
    }
}

int query(int x) {
  int ans=0;
  for (int ll=lo[x], rr=hi[x]; ~x; x=pi[x])
    if (lo[x] <= ll && rr <= hi[x]) { //consult ancestors
      ans += ft[x];
    }
  return ans;
}

Exercises

Infoarena — Disconnect

Statement: You have a tree and two types of operations:

  • Remove an edge.
  • Query if two nodes are connected.

We can see this problem as updating values in subtrees: In the beginning, all the values start with $$$0$$$ and every time we update increment by $$$1$$$ all the nodes of the subtree, to answer we must know $$$value(a) + value(b) - 2 * value(LCA(a, b))$$$ and check if it's equal to $$$0$$$.

How does this work? We mantain a counter of deleted edges for each node. This counter will store the number of deleted edges from the root of the tree to the given node. Thus, adding $$$1$$$ to all the nodes in the subtree of $$$x$$$ will be equivalent to deleting the edge between $$$x$$$ and its parent.

code spoiler

Codeforces — Propagation Tree

Statement: You have a tree and have two types of operations:

  • Given $$$val$$$ and $$$x$$$, update the nodes with the following rule: $$${-1}^{dist(x, u)} val$$$, for every node $$$u$$$ in the subtree of $$$x$$$.

  • Query the current value of a node.

Just consider the distance to the intermediate centroid which we are going to update to calculate the value.

code spoiler

GP of Nanjing: Data Structure

Statement: Given $$$T$$$ cases, you have a tree with $$$n$$$ nodes rooted in node $$$1$$$, initially, each node has weight $$$0$$$ and you need to make $$$m$$$ queries of two possible types:

  • Given $$$a$$$, $$$x$$$, $$$y$$$, $$$z$$$, add $$$z$$$ to the weights of all descendants $$$v$$$ of $$$a$$$ such that $$$d(a, v) \mod x$$$ is $$$y$$$.

  • Given a node $$$a$$$, return its weight.

The constraints are:

  • $$$1 \le T \le 4$$$
  • $$$1 \le n, m \le 3\cdot 10^{5}$$$
  • $$$1 \le a, x \le n$$$
  • $$$0 \le y < x$$$, $$$0 \le z \le 500$$$.

Solution: We can maintain for each node $$$v$$$ an array of length equal to the maximum distance from $$$v$$$ to the nodes in the tree whose centroid equal to $$$v$$$, where we save the updates to be queried by their descendants, this would give us a performance per query equal to $$$O(\log (n))$$$ (traversing the $$$O(\log{n}$$$) ancestors and getting their propagated value in $$$O(1)$$$), and a performance per update equal to $$$O(\frac{n}{x})$$$, notice that it is not $$$O(\frac{n}{x} \log n)$$$ because can be seen as the sum:

$$$\sum\limits_{k=0}^{\log(n) - 1} \frac{2^i}{x} = O\left(\frac{n}{x}\right)$$$

As this is not optimal for updates, we can save for each node $$$v$$$ a matrix of size $$$t(v) \times t(v)$$$, where we would store for $$$x \le t(v)$$$, the accumulated queries that have come, this allows us to make queries in $$$O(\log{n} + t(v) + t(\pi(v)) + \ldots + t(root))$$$ and updates in $$$O\left(\log{n} + \frac{len(v)}{t(v)} + \frac{len(\pi(v))}{t(\pi(v))} + \cdots + \frac{n}{t(root)}\right)$$$

Note: Recall that $$$\pi(u)$$$ is the parent of the node $$$u$$$ and $$$len(v)$$$ is the maximum distance from $$$v$$$ to any of its descendants.

We can see these sums in another way, since the parent centroids have at least twice as many nodes as their corresponding child centroid:

For updates, we must :

$$$ T(Update) = O(\log{n}) + \sum\limits_{i=0}^{floor(\log (n) - 1)} \frac{2^i}{t_{i}} $$$

,

For queries:

$$$ T(Query) = O(\log{n}) + \sum\limits_{i=0}^{floor(\log (n) - 1)} t_{i} $$$

We can ignore the $$$O(\log{n})$$$ when assigning $$$t_{i}$$$ because it's not affected by it, so if we set $$$t_{i} = \frac{2^{i}}{t_{i}}$$$, which would be $$$t_{i} = \sqrt{len(v_{i})}$$$, which would give us a complexity.

$$$ \sum\limits_{i=0}^{floor(\log (n) - 1)} \sqrt{2}^i $$$
$$$ \frac{\sqrt{2}^{\log{n}} - 1}{\sqrt{2} - 1} = O(\sqrt{n}) $$$

therefore our final complexity would be $$$O(m \sqrt{n} + n \log{n})$$$ per test case.

code spoiler

I hope this technique will help you, I am attentive to recommendations for problems, doubts and/or errors in my analysis.

Notice that for a path with quantity of elements equal to a power of $$$2$$$, the structure represents a Fenwick Tree.

More problems:

Tags centroid decomposition, fenwick tree

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en1 English Bashca 2020-09-30 05:46:11 17338 Initial revision (published)