smax's blog

By smax, 8 months ago, In English

Hey everyone! Usually when I post stuff on CF I just drop an external link to my blog, but this time I would like to also paste the same content natively on CF to see if it leads to more engagement. You can read the exact same article on my blog here.

Also, special thanks to alien_lover and gabrielwu for proofreading.


Motivation

We will solve this problem.

Prologue

The original name of this problem is SONE1, originating from the fiery depths of Chinese OI. Amazingly, the Chinese have figured out how to augment the link/cut tree to solve this problem (ref 1, 2). Maintaining subtree aggregates with a LCT isn't difficult, and there's a well-known Codeforces article by ouuan about it, but supporting lazy subtree updates is much harder. There's this, but it requires the updates to be invertible. I haven't found any English resources explaining how to augment LCT to solve this problem, so this article is my attempt at doing that. Most of the content is credited to this article, and the implementation is built upon a clean ordinary LCT implementation by bicsi.

EDIT: As an aside, some readers have pointed out that this is essentially what's referred to in English as a "Top Tree". The articles I referenced referred to it as an "AAA Tree". I initially assumed they were different, but after skimming the paper in question there indeed seem to be similarities, just differences in what stuff are called.

Prerequisites

Terminology

I will use the same terminology used in the resources linked in the prerequisites section. Here is a complete list of them to ensure we are all on the same page before proceeding:

Terminology

The Core Idea

I have a LCT, and I store the aggregates of each subtree in virtual subtrees. Any child of a node is either a preferred child or a virtual child. The issue with doing lazy propagation directly on this model is that a node could have several (up to $$$\mathcal O(n)$$$) virtual subtrees hanging from it, and we cannot push down the lazy tags to each of its virtual children each time. The fix is surprisingly simple: binarize the tree. This way, when we push down lazy tags, we only have at most two virtual children to push to, and the complexity is fixed. To understand what I mean by binarize, take a look at the diagram of LCTs below:

Image 1
The solid lines represent preferred edges, and the dotted lines are unpreferred/virtual edges. The nodes 2, 7, 1, 3 form a preferred path in that order. Node 1 has both a left and right preferred child, and it has 3 virtual children.

The LCT on the left is what an ordinary LCT might look like. Node 1 has 3 virtual children, which is more than 2. Therefore, we binarize its virtual children by introducing a new fake vertex to connect both 4 and 5. Now as seen in the LCT on the right, each node has at most 2 preferred children and 2 virtual children (up to 4 children in total). So simple!

If only... the devil is in the details...

If you were to go and start coding this idea right now, you might realize there are quite few pitfalls to keep in mind while coding. The next few sections will be dedicated to talking about some of the implementation details in depth.

The Access Method

Since everything in LCTs revolve around access, let's first characterize the access method. Here's what a typical access method might look like:

// tr is a global array storing each of the LCT nodes
// p is the parent and ch is a size 4 array of its children
void attach(int u, int d, int v) {
    tr[u].ch[d] = v;
    tr[v].p = u;
    pull(u);
}

// adds v as a virtual child of u
void add(int u, int v) {
    tr[u].vir += tr[v].sum;
}

// removes v as a virtual child of u
void rem(int u, int v) {
    tr[u].vir -= tr[v].sum;
}

void access(int u) {
    splay(u);
    add(u, tr[u].ch[1]);        // the preferred path ends at u, so any node with greater depth should be disconnected and made virtual instead
    attach(u, 1, 0);            // sets the right preferred child of u to null (0)
    while (tr[u].p) {
        int v = tr[u].p;
        splay(v);
        add(v, tr[v].ch[1]);    // disconnect any node with greater depth than v, so result is a preferred path ending at v
        rem(v, u);              // u is no longer a virtual child of v
        attach(v, 1, u);        // u is now the right preferred child of v
        splay(u);
    }
}

You'll be glad to hear this isn't far from the final product. The fundamental process of accessing a node remains the same. The only difference is that the add and rem methods become more complex, because we may have to introduce more fake nodes for binarization.

The Add Method

Firstly, let's take a look at add.

// attaches v as a virtual child of u
void add(int u, int v) {
    if (!v)
        return;
    for (int i=2; i<4; i++)
        if (!tr[u].ch[i]) {
            attach(u, i, v);
            return;
        }
    int w = newFakeNode();
    attach(w, 2, tr[u].ch[2]);
    attach(w, 3, v);
    attach(u, 2, w);
}

The first case is straightforward: if one of our 2 virtual children is null, simply set $$$v$$$ as that virtual child instead.

for (int i=2; i<4; i++)
    if (!tr[u].ch[i]) {
        attach(u, i, v);
        return;
    }

The second case is when $$$u$$$ already has 2 virtual children. In that case, we create a new fake node connecting both $$$v$$$ and one of our old virtual children, and substituting that as the new virtual child.

int w = newFakeNode();
attach(w, 2, tr[u].ch[2]);
attach(w, 3, v);
attach(u, 2, w);

The rem method is slightly more complex and requires the introduction of several other concepts first, so we'll come back to that in a minute.

Bound on Fake Node Creation

Firstly, while we're on the subject of fake nodes, how many fake nodes will we create? If we only create fake nodes without destroying, we could create as many as the number of preferred child changes, which is $$$\mathcal O(n \log n)$$$ with significant constant. That's often an undesirable amount of memory to use. Fortunately, observe that at any point, we only need at most $$$n$$$ fake nodes to binarize the entire tree. Therefore, as long as we recycle memory and destroy fake nodes in the rem method when they become unnecessary, we will limit our memory usage to just $$$2n$$$ nodes.

Fake Nodes in Our Preferred Paths?

If we just splay willy-nilly in our access method, we might connect fake nodes into our preferred paths. This is undesirable because it becomes messy to keep track of where fake nodes are and when we can destroy them for preserving memory. Thus, we'll need to make the following change to access.

Old Access

Instead of int v = tr[u].p, we will do int v = par(u), where par is a method giving us the lowest ancestor of $$$u$$$ that is not a fake node.

int par(int u) {
    int v = tr[u].p;
    while (tr[v].fake)
        v = tr[v].p;
    return v;
}

However, this is not good enough. There is no height guarantee on our binary tree formed from fake nodes, so we could be repeating $$$\mathcal O(n)$$$ work each time and break complexity. The solution? Make it a splay. So we have two splays: the usual one in ordinary LCT and one on the fake nodes. The new par method would thus look like the following:

int par(int u) {
    int v = tr[u].p;
    if (!tr[v].fake)
        return v;
    fakeSplay(v);
    return tr[v].p;
}
Technically...

Fake Splay

There's no need to remake a new set of splay functions for the fake splay cause that's wasteful. Instead, strive to write reusable splay code that works for both versions depending on a parameter passed in. I don't want to get too into the weeds of the splay implementation because that's more about splay trees and less about LCT, and because everyone implements splay differently, so I'll just copy paste my before and after showing how I adapted splay tree code to work for both versions:

Before
After

How to Do the Lazy Propagation

A first thought on doing lazy propagation is to simply maintain 2 lazy values: one for the path, and one for the subtree. However, it's important to ensure the two lazy values cover a disjoint partition of nodes, because otherwise there's no way to properly order the combination of lazy values affecting the intersection of the two partitions. To clarify what I mean, consider the range set operation applied to a simple array.

$$$ [1, 2, 3, 4, 5] $$$

Say we range set all values in the interval $$$[1, 3]$$$ to $$$7$$$, range set the interval $$$[3, 5]$$$ to $$$9$$$, and then range set the interval $$$[1, 3]$$$ again to $$$10$$$. Say we maintain two lazy values, one for the interval $$$[1, 3]$$$ and one for the interval $$$[3, 5]$$$, which would be $$$9$$$ and $$$10$$$ respectively. The issue is that it isn't obvious what the lazy effect on index $$$3$$$ is. In this case, we could maintain the time stamp of both lazy values and use the later one to determine the effect, but you can see how if we use some other lazy operation where intermediate operations, not just the last one, have an effect (e.g. applying an affine function $$$x := ax + b$$$ to a range), then this breaks down.

The same applies for the LCT model. If a LCT node has two lazy values, one for the path and one for subtree (in the represented tree), then we need them to cover a disjoint partition of all nodes in the subtree (in the LCT, not in the represented tree). For brevity, let's denote the first lazy value as plazy (short for "path lazy") and the second value as slazy (short for "subtree lazy"). We will also define 3 aggregates in each node: path, sub, and all. path is the standard path aggregate used in ordinary LCT, so it is tr[tr[u].ch[0]].path + tr[tr[u].ch[1]].path + tr[u].val. sub will be everything else not covered in path, so tr[tr[u].ch[0]].sub + tr[tr[u].ch[1]].sub + tr[tr[u].ch[2]].all + tr[tr[u].ch[3]].all. And finally, all is just a combination of both to get everything in this LCT subtree, or tr[u].all = tr[u].path + tr[u].sub. Putting it all together, we get the following pull method for recalculating the aggregates in a node:

void pull(int u) {
    if (!tr[u].fake)
        tr[u].path = tr[tr[u].ch[0]].path + tr[tr[u].ch[1]].path + tr[u].val;
    tr[u].sub = tr[tr[u].ch[0]].sub + tr[tr[u].ch[1]].sub + tr[tr[u].ch[2]].all + tr[tr[u].ch[3]].all;
    tr[u].all = tr[u].path + tr[u].sub;
}

Now for pushing down the lazy tags! Pushing plazy is the same as ordinary LCT: push down to the 0th and 1st child. Pushing slazy is slightly less obvious. Specifically, we will only update slazy for the 0th and 1st child, and we will update both plazy and slazy for the 2nd and 3rd child. For the 2nd and 3rd child, we update both lazy values because all values in that LCT subtree represent nodes affected in the real subtree, so both path and sub must be updated. But why don't we update both lazy values in the 0th and 1st child? It's because if we were to receive an update for this entire subtree from the parent, then there are 2 cases:

  1. The current node is a preferred child of its parent. In this case, recursively go up until we hit case 2.
  2. The current node is a virtual child of its parent. In this case, when we push down from the parent, the current node would have gotten both its plazy and slazy values updated. So the path part of its LCT subtree is already fully updated. If we were to update plazy of its 0th and 1st children again when pushing down from the current node, then we would be applying the update twice to the path.
Image 2
Say node 1 is following case 2, meaning it is a virtual child of its parent. Node 1 just received a tag pushed down from its parent, so its plazy and slazy values are updated. The nodes highlighted in red are the ones already correctly accounted for by plazy. If we were to push down and update node 2's plazy with node 1's slazy as well, then node 2 would be receiving the update twice which is incorrect.

So putting everything together gives us the following push method:

void pushPath(int u, const Lazy &lazy) {
    if (!u || tr[u].fake)
        return;
    tr[u].val += lazy;
    tr[u].path += lazy;
    tr[u].all = tr[u].path + tr[u].sub;
    tr[u].plazy += lazy;
}

void pushSub(int u, bool o, const Lazy &lazy) {
    if (!u)
        return;
    tr[u].sub += lazy;
    tr[u].slazy += lazy;
    if (!tr[u].fake && o)
        pushPath(u, lazy);
    else
        tr[u].all = tr[u].path + tr[u].sub;
}

void push(int u) {
    if (!u)
        return;
    if (tr[u].plazy.lazy()) {
        pushPath(tr[u].ch[0], tr[u].plazy);
        pushPath(tr[u].ch[1], tr[u].plazy);
        tr[u].plazy = Lazy();   // empty constructor for lazy resets it
    }
    if (tr[u].slazy.lazy()) {
        pushSub(tr[u].ch[0], false, tr[u].slazy);
        pushSub(tr[u].ch[1], false, tr[u].slazy);
        pushSub(tr[u].ch[2], true, tr[u].slazy);
        pushSub(tr[u].ch[3], true, tr[u].slazy);
        tr[u].slazy = Lazy();
    }
}
Note

The Rem Method

I promised you we would come back to the rem method. Well, we're finally ready to tackle it!

int dir(int u, int o) {
    int v = tr[u].p;
    return tr[v].ch[o] == u ? o : tr[v].ch[o+1] == u ? o + 1 : -1;
}

void recPush(int u) {
    if (tr[u].fake)
        recPush(tr[u].p);
    push(u);
}

void rem(int u) {
    int v = tr[u].p;
    recPush(v);
    if (tr[v].fake) {
        int w = tr[v].p;
        attach(w, dir(v, 2), tr[v].ch[dir(u, 2) ^ 1]);
        if (tr[w].fake)
            splay(w, 2);    // fake splay
        free(v);
    } else {
        attach(v, dir(u, 2), 0);
    }
    tr[u].p = 0;
}

What's happening here? Let's break it down. $$$u$$$ is the node that we are disconnecting from its parent, $$$v$$$. First consider the easier of the two cases:

} else {
    attach(v, dir(u, 2), 0);
}

This clause triggers when $$$v$$$ is not a fake node. In this case, it's exactly the same as ordinary LCT. Remove $$$u$$$ as a virtual child of $$$v$$$.

We could handle the second case in the same way, but there's an extra consideration. If $$$v$$$ is fake, then after removing $$$u$$$, $$$v$$$ only has one virtual child and is therefore obsolete. We can destroy the node and attach $$$v$$$'s other child directly to $$$v$$$'s parent instead! This step is necessary to ensure our number of fake nodes remains under our bound of $$$n$$$.

There's one more consideration: correct application of pushes and pulls. Consider the following scenario:

Image 3
We are calling rem on node 4 and attaching it as a preferred child of 1 instead. The red fake node has a pending lazy tag waiting to be pushed down. The left and right diagrams show before and after the procedure.

See the issue? When we remove $$$u$$$ and reattach it above, it could "dodge" a lazy tag that it's supposed to receive. So to remedy this, we need to call push on all fake ancestors of $$$u$$$ up to the next non-fake node, which is what the method recPush does in my code. Finally, notice that after removing $$$u$$$, we would need to call pull again on all fake ancestors up to the next non-fake node to correctly set their values. We could create another method recPull with similar structure to recPush. Or, we could just call splay at the end to correctly set all the node values moving up. Calling splay would also fix any amortized complexity concerns from calling recPush.

And that's it! We've successfully implemented our new version of access!

The Rest of the Methods

Once you have access, every other method is sincerely trivial. I'll include them here for sake of completeness.

Reroot, link, cut, lca, and path query/update are all completely identical to ordinary LCT.

Recap of Those Methods

Finally, for subtree queries/updates. Say we are querying for the subtree of $$$u$$$ given that $$$r$$$ is the root. First:

  • Reroot $$$r$$$
  • Access $$$u$$$

Now, $$$u$$$'s subtree in the represented tree consists of all the virtual subtrees hanging off of $$$u$$$, plus $$$u$$$ itself. So in other words, tr[tr[u].ch[2]].all + tr[tr[u].ch[3]].all + tr[u].val. For query, we just return that value. For update, we update tr[u].val directly, and then we update both the plazy and slazy values of its two virtual children. We do not touch the 0th or 1st child, since those are not part of $$$u$$$'s subtree.

The Code

I've pasted snippets here and there, but if you're still fuzzy on some of the implementation details, you can see my complete code here.

Performance

Image 4

Presumably, the complexity of this augmentation of LCT is still $$$\mathcal O(\log n)$$$, but with a significant constant factor (this article estimates the constant factor to be $$$97$$$). On DMOJ, it runs in around 0.6-0.7 seconds for $$$n, q \leq 10^5$$$. On Library Checker, it runs in a little over 1 second for $$$n, q \leq 2 \cdot 10^5$$$. So its performance is closer to $$$\mathcal O(\log^2 n)$$$ in practice. Some of the implementations in the Chinese articles run faster than mine, but that seems to come at the cost of readability and some strange code, and my implementation is fast enough for almost all reasonable competitive programming purposes.

Applications

Answer

Full text and comments »

 
 
 
 
  • Vote: I like it
  • +259
  • Vote: I do not like it

By smax, history, 12 months ago, In English
 
 
 
 
  • Vote: I like it
  • +80
  • Vote: I do not like it

By smax, 14 months ago, In English
 
 
 
 
  • Vote: I like it
  • +75
  • Vote: I do not like it

By smax, 16 months ago, In English
 
 
 
 
  • Vote: I like it
  • +248
  • Vote: I do not like it

By smax, history, 16 months ago, In English
 
 
 
 
  • Vote: I like it
  • +148
  • Vote: I do not like it

By smax, 17 months ago, In English
 
 
 
 
  • Vote: I like it
  • +58
  • Vote: I do not like it

By smax, 17 months ago, In English

Hey everyone,

I recently started a blog, and I think one of my articles on divide and conquer turned out pretty well, so I thought I'd share it here: https://mzhang2021.github.io/cp-blog/divide-and-conquer/. It covers both common and rarer techniques.

Speaking of the blog, if you're interested feel free to check it out here. I plan on posting regularly, and you can find more information in my introductory post.

Why not just post blogs on Codeforces? Why make a separate blog?

Thank you!

Full text and comments »

 
 
 
 
  • Vote: I like it
  • +281
  • Vote: I do not like it

By smax, 3 years ago, In English

Hey all,

I find one of my biggest weaknesses is debugging efficiently under contest settings. If you look at my contest history, there are plenty of instances where I get horrible penalties and wrong multiple submissions on a given problem before potentially solving it. The most recent example is today's Educational Round, where I took 40 minutes to figure out what was wrong with my D solution.

I am well aware that there are plenty of blogs and posts on debugging upon Googling, but a lot of the advice is kind of generic and don't always work for me, so I was hoping to get some more specific answers via this blog.

I'll try to provide some context so you know what I'm already doing. Here is my current debug strategy during contests:

TLE/MLE/RE: Usually not very common for me. I just look for common sources of these errors. For TLE or MLE, I check things like allocating lots of vectors inside loops, while loop conditions and recursive functions, etc. For RE, I check for index out of bounds, etc.

WA: The verdict I get 90% of the time. In general, I first do a quick scan through my code for immediately obvious errors that I can catch on the spot.

A. WA on samples/I have a test case that I know breaks my program: Usually not too bad because I can just print intermediate variables and locate which part of my program is bust. I have a basic debug template so I can type DEBUG(x) to print the contents of variables. Depending on how complicated the implementation is, this is probably fixable in 5-10 min if it's not a fault in my logic.

B. WA and I don't have a test case, but I do know a brute force solution: I'll write up the bash solution and generator, and run my bash solution against my current solution with a bash script. Once I find a test case, I proceed to case A. If I type fast, I might pull this off in 5-10 min as well.

C. WA, nothing to work with except the dreaded "WA on pretest X" message: This is where shit hits the fan. I try things like rereading the problem statement, rereading my code, looking over my logic. I try edge cases like n = 1, but it's rarely that obvious. I try to think of counter cases, but I can never think of the case needed to break my program. It probably doesn't help that when I can't debug quickly, I get frustrated, and that only makes it worse. This can take anywhere from 30+ min to never fixing it at all throughout the entire contest.

So I guess my question is, how do you all deal with case C? Thanks in advance.

Full text and comments »

 
 
 
 
  • Vote: I like it
  • +83
  • Vote: I do not like it