*Good morning!*

In this blog, I will present an online algorithm that can perform priority-queue-like undoing on a data structure, given a few assumptions. I will also present and solve a generalization of that problem.

For context, I highly recommend reading this blog that shows how to solve the easier problem of queue-like undoing. That blog gave me the inspiration for the trick I will describe here, but I will still try to write this blog so that it can be understood without it. Thank you Noam527 for introducing such a great trick to the competitive programming community.

[Tutorial] Supporting Queue-like Undoing on DS

## Acknowledgment

Huge thanks go to peltorator for hosting a blog contest. His generous prize of $300 got many people to write blogs on new, interesting ideas, and it was one of the main reasons I wrote this blog. I'm honored to win the prize, so thanks again to peltorator!

## Problem Statement

Let $$$D$$$ be a data structure that supports updates and queries. We require $$$D$$$ to be **commutative**, meaning that if we apply a set of updates and then a query, the answer to the query will be the same regardless of the order that we applied the updates. Let's assume that if we apply a sequence of $$$n$$$ updates to $$$D$$$, then each update will take $$$O(T(n))$$$ **non-amortized** time. Let's also assume that we have a way to undo the most recent update in $$$O(T(n))$$$ time (this can usually be supported by maintaining a stack of changes we've made to the internal state of $$$D$$$, and undoing the changes on the top of the stack).

Now, we would like to the solve the problem of supporting the following types of operations online:

- Apply an update to $$$D$$$ with an associated priority $$$p$$$. (Assume all priorities are unique).
- Undo the update in $$$D$$$ that has the highest priority. (Recall that the updates are commutative).
- Answer a query about $$$D$$$.

Given a sequence of $$$n$$$ updates and queries, we will solve this problem in $$$O(n T(n) \log n)$$$ time. In other words, each operation takes $$$O(T(n)\log n)$$$ **amortized** time.

### Subproblems

There are a few special cases of the above problem that we get for free.

First, if the priorities are inserted in decreasing order, this is the same as the queue-like undoing problem described in Noam's blog.

Second, if we remove the online requirement, then we can support arbitrary deletions in the same time complexity. This is because we can pre-compute priorities that reduces the problem to the one described above. We simply say that the priority of an update is inversely correlated to its deletion time. This subproblem of offline deletion can also be solved using a different well-known trick.

## Idea

The idea is that we have a stack $$$S$$$ of updates in the order that we applied them to $$$D$$$. If the highest priority update coincides with the top of the stack, that's great, we can just pop it and undo the most recent update to $$$D$$$. However if we need to undo an update that's buried deep within the stack, we need to spend a lot of work undoing all of the updates that come after it, and then pushing them back onto the stack.

Fortunately, we have some flexibility that can help us save time. Firstly, we can decide to push the updates back onto the stack in a different order than they were before. This can help us because we can choose to put higher priority updates closer to the top of the stack, which saves us time later on. Secondly, we can choose to pop even more elements than required for the current undo operation. It can sometimes help to dig deeper into the stack because it can make up for the work we've already done. For example, suppose we've already popped a lot of elements to get the highest priority update. If the second highest priority update is close behind it, it might be worth popping it out as well so that it will be more accessible later.

## Solution

In this solution, the operations of applying new updates and answering queries will be handled in a straightforward manner. When we apply an update we push it directly to the stack and apply it to $$$D$$$. When we answer a query, we don't touch the stack at all.

Then when we need to undo the highest priority update, we need to decide the number of updates to pop from the stack. Let's choose the smallest number $$$k\ge 1$$$ such that the following condition holds: The $$$\lceil\frac{k}{2}\rceil$$$ highest priority updates in the stack are among the top $$$k$$$ positions of the stack.

After we pop these $$$k$$$ updates, we push them back onto the stack in such a way that the $$$\lceil\frac{k}{2}\rceil$$$ highest priority updates are all on top of the stack, sorted in increasing order.

### Details

Now, it's a bit of a non-trivial detail to efficiently check if the desired condition has been met. The proof of complexity will rely on the fact that processing $$$k$$$ updates on the stack takes $$$O(\log n + k T(n))$$$ time. To make the result as general as possible, we want to support the case where $$$T(n)=1$$$, so our overhead costs should be $$$O(\log n + k)$$$. This means we can't afford to do something such as sorting the top $$$k$$$ elements by priority.

In order to satisfy this requirement, let's say that in addition to the stack of updates we also maintain a set data structure of the updates, ordered by priority. (Such as `std::set`

in C++). When we add an update, it takes $$$O(\log n)$$$ time to insert into the set, and when we pop an update, it takes $$$O(\log n)$$$ time to delete it from the set. In the undo operation, as we iterate $$$k$$$ in increasing order, we simultaneously iterate through the $$$\lceil\frac{k}{2}\rceil$$$ elements of the set with the highest priority. Let's say that the stack-depth of an update is the number of updates currently above it on the stack. As we iterate through the set, we keep track of the largest stack-depth of those elements, and compare with $$$k$$$. If $$$k$$$ is larger than the largest stack-depth, the condition is met. Otherwise, we need to continue iterating $$$k$$$.

After we decide to stop iterating, we can efficiently push all the elements back onto the stack. First we can push all the elements that are not among the highest $$$\lceil \frac{k}{2}\rceil$$$ priorities. We can push them in the same order that they originally appeared on the stack, for example. (Remember we can't afford to sort them.) Then, we push the remaining elements in increasing order of priority. That can be done efficiently by iterating through the top elements of the set, and doesn't require us to sort from scratch.

And so, we can see that processing $$$k$$$ elements of the stack can be done in $$$O(\log n + k T(n))$$$ time, which is good enough for the proof of time complexity of the entire algorithm.

## Complexity Analysis

Let's say that the cost of an operation is the number of stack elements processed. So, a push operation has a cost of $$$1$$$ and an undo operation has a cost of $$$k$$$, where $$$k$$$ is the number of elements of the stack popped in that operation. We will ignore query operations. If $$$C_i$$$ is the cost of the $$$i$$$-th operation, then we will show that $$$\sum_{i=1}^n C_i = O(n\log n)$$$. Note that this is sufficient to show that the overall time complexity is

Now we will show that for any two indices $$$i<j$$$, we have $$$2(j-i)\ge \min(C_i, C_j).$$$ This is clearly true if either operation $$$i$$$ or $$$j$$$ is a push operation, so let's assume that they are both undo operations. Assume for contradiction that the inequality does not hold, and let $$$i$$$, $$$j$$$ be a counterexample where $$$j-i$$$ is the smallest possible.

Let $$$k=\min(C_i,C_j)$$$. Since $$$k\le C_i$$$, we know that after operation $$$i$$$, the $$$\lceil \frac{k}{2}\rceil$$$ highest priority updates are on top of the stack in increasing order. Let $$$U$$$ be the set of those updates. Suppose that between operations $$$i$$$ and $$$j$$$ there are $$$m_1$$$ push operations and $$$m_2$$$ pop operations. So, there are at most $$$m_1$$$ new elements and at least $$$\lceil\frac{k}{2}\rceil-m_2$$$ elements remaining from $$$U$$$ at the time before operation $$$j$$$.

Since $$$C_j\ge k$$$, we know that either it stopped before it passed over all the remaining elements of $$$U$$$ when processing the stack, or the condition was not satisfied the first time it encountered all the remaining elements of $$$U$$$. In the first case, there needed to be at least $$$\lceil\frac{k}{2}\rceil$$$ additions between $$$i$$$ and $$$j$$$, so $$$j-i\ge \lceil\frac{k}{2}\rceil$$$ which contradicts our assumption. In the second case, this means that $$$\lceil\frac{k}{2}\rceil-m_2<m_1$$$, otherwise the condition would have been satisfied. But then we have that

contradicting our assumption. Therefore, the inequality always holds.

Let $$$y_c$$$ be the number of operations $$$i$$$ such that $$$C_i\ge c$$$. The above fact shows that more expensive operations are farther apart, so we know that $$$y_c\le 1+\frac{2n}{c}$$$. Finally, we have that

## Generalization to Multiple Priorities

Although priority-queue-like undoing is pretty cool, I realized it's still not general enough to support deque-like undoing, where we can undo either the oldest update or the most recent update. More generally, what if priorities are $$$d$$$-dimensional vectors, and we want to undo the update that has the largest priority in a specified dimension? That is, we would like to support the following operations:

- Apply an update with associated priority vector $$$(p_1,\ldots, p_d)$$$.
- Undo the update whose priority vector has the largest value in dimension $$$i$$$.

Note that this can simulate deque-like undoing because we can use priorities $$$(p, -p)$$$.

### Solution

We can support this generalization in the following way. At the end of an undo operation, push elements to the stack in a cyclic manner. For example if there are three dimensions $$$x$$$, $$$y$$$, $$$z$$$, then we make sure the element with the largest $$$x$$$ value is on top, then of the remaining elements make sure the element with the largest $$$y$$$ value is on top, then $$$z$$$, then $$$x$$$, and so on.

When we pop elements, we stop when the following condition is met: For each dimension $$$i=1\ldots d$$$, the $$$\lceil \alpha k\rceil$$$ elements with the highest priorities in dimension $$$i$$$ are among the top $$$k$$$ positions of the stack. Here, $$$\alpha$$$ is some constant between $$$0$$$ and $$$\frac1d$$$ will be decided later. For the implementation detail, we can maintain $$$d$$$ different set data structures, with the $$$i$$$-th one sorting the updates according to the $$$i$$$-th dimension.

### Complexity Analysis

The complexity analysis is very similar; all we have to show is that there is some value $$$\beta$$$ such that for any two operations $$$i<j$$$, we have $$$\beta(j-i)\ge \min(C_i, C_j)$$$. Then we can guarantee that the complexity is $$$O(\beta n T(n) \log n)$$$.

Suppose that $$$C_i\ge k$$$. Then for each dimension, the largest $$$\alpha k$$$ elements of that dimension are present among the top $$$d\alpha k$$$ positions of the stack after operation $$$i$$$. Let $$$U$$$ be the set of those $$$d\alpha k$$$ elements. Let's say we apply $$$m_1$$$ additions and $$$m_2$$$ deletions between $$$i$$$ and $$$j$$$.

Since $$$C_j\ge k$$$, we know that either it stopped before it passed over all the remaining elements of $$$U$$$ when processing the stack, or the condition was not satisfied the first time it encountered all the remaining elements of $$$U$$$. In the first case, there needed to be at least $$$(1-d\alpha)k$$$ additions between $$$i$$$ and $$$j$$$, so $$$(j-i)\ge (1-d\alpha)k$$$.

In the second case, assume without loss of generality that the condition failed on the first dimension when it first encountered the remaining elements of $$$U$$$. There are at least $$$\alpha k-m_2$$$ remaining elements from $$$U$$$ with the highest priorities in dimension $$$1$$$ and at most $$$d\alpha k + m_1-m_2$$$ overall elements at the time the condition fails. Therefore,

Let $$$m=m_1+m_2$$$ and $$$\lambda=\frac{m_1}{m}$$$ be the proportion of additions out of the operations. If we replace $$$m_1=\lambda m$$$ and $$$m_2=(1-\lambda)m$$$ and rearrange, we get

Let's define a function $$$f(\alpha, \lambda)$$$ to be the right-hand side of that inequality. The worst case is when $$$f(\alpha, \lambda)$$$ is small because it gives us the weakest lower bound for $$$m$$$. Note that the denominator is a convex combination of $$$\alpha$$$ and $$$(1-\alpha)$$$. So depending on which is larger, the worst case is either $$$\lambda=0$$$ or $$$\lambda=1$$$. Therefore, if we set

then the above claim holds.

This means that if we treat $$$d$$$ as a constant, the complexity is $$$O(nT(n)\log n)$$$.

### Optimizing $$$\alpha$$$

The complexity analysis above is somewhat incomplete, because we haven't seen how to choose an optimal $$$\alpha$$$ value or how the complexity scales up with $$$d$$$. To get the best guarantee of complexity, we want $$$\beta$$$ to be as small as possible, or $$$\frac{1}{\beta}=\min(1-d\alpha, f(\alpha, 0), f(\alpha, 1))$$$ to be as large as possible.

First, consider the case where $$$d=1$$$. In this case, $$$f(\alpha, 0)=\alpha$$$ and $$$f(\alpha, 1)=1-\alpha$$$. We can easily see that the optimal value of $$$\min(\alpha, 1-\alpha)$$$ happens at $$$\alpha_{opt}=\frac12$$$.

Now let's consider $$$d\ge 2$$$. We have that $$$\alpha<\frac1d\le\frac12\le 1-\frac1d<1-\alpha$$$ which means that $$$f(\alpha, 0)\le f(\alpha, 1)$$$. It's also always true that $$$f(\alpha, 0)\le 1-d\alpha$$$. So, we really just need to maximize $$$f(\alpha, 0)$$$ over $$$0<\alpha<\frac1d$$$.

With some calculus, we find that the optimal value is

Also with the help of my friend Mr. Wolfram Alpha, we find that

This means that $$$\beta=O(d)$$$ and so our final complexity becomes $$$O(dnT(n)\log n)$$$.

As a mini-disclaimer, I'll note that the optimized $$$\alpha$$$ values are only theoretical and a different value might be better in a practical implementation. With that out of the way, here is a reference table of approximate values for a few dimensions. It would be interesting to see how they compare on a benchmark implementation.

$$$d$$$ | $$$\alpha_{opt}$$$ | $$$\beta$$$ |
---|---|---|

$$$1$$$ | $$$0.5$$$ | $$$2$$$ |

$$$2$$$ | $$$0.292893$$$ | $$$5.828427$$$ |

$$$3$$$ | $$$0.183503$$$ | $$$9.898979$$$ |

$$$4$$$ | $$$0.133975$$$ | $$$13.928203$$$ |

$$$5$$$ | $$$0.105573$$$ | $$$17.944272$$$ |

Thanks for coming to my TED talk.

Thank you monogon sirde for writing this amazing blog! I was very skeptical at first on why this is needed but today I saw a problem that could be solved using this trick and noticed that this opens very many doors! Thank you again!

monogonosity

Beautiful

How to update the depths in the priority set after a pop? Updating the $$$\lceil \frac{k}{2} \rceil$$$ with highest priorities is trivial in $$$\mathcal O(k + \log n)$$$, but how about the rest of them, without costing $$$\mathcal O(k \log n)$$$? They could be all over the place in our set.

I guess one way to solve this is to store the set iterators in the stack order, then we can access the each value in constant time. I will try to implement this later.

You can use a std::set of pointers, and then update the stack depth in the structure it points to (accessing it from the stack instead of the set). It doesn't mess up the comparator so it works nicely.

I have implemented this trick: link.

It has been stress-tested, and also tested on Problem A as a DSU queue: 189148286.

Seems to be reasonably fast, it ran in twice the time as the offline solution mentioned in the blog, and ran faster than LCT solutions. I am satisfied with this implementation, and maybe I will try implementing the multi dimension version as well.

After further testing we discovered that using the solution described in [Tutorial] Supporting Queue-like Undoing on DS is about 3 times faster.

Hi, I was forced by Monogon to share a solution to a problem using this trick:

603E - Pastoral Oddities

SolutionFirst observe that it is both necessary and sufficient that each connected component in the graph has even size.

It is necessary because using an edge adds 2 to the total degree of its component, and in an odd size component where each vertex has odd degree, the total degree is also odd.

It is sufficient, because this constructive/greedy algorithm to find an odd cover of a tree: Arbitrarily root the tree, and use the edge going up from the root of every odd size subtree. The only subtre without an edge up is the whole tree, which has even size.

So then the question reduces to this: What is the smallest weight, such that if we consider only edges with weight up to it, all components in the remaining graph are of even size. Offline this can be answered by sorting the edges, then using a union find data structure to maintain how many components have odd size. But in this problem we need to find the answer for each prefix of the edge list.

The solution for a single sorted list can easily implement an undo operation to remove an edge, which means we can use the Monogon stack to pq trick. Add the edges in the order they appear in input to the pq. If if the graph is still bad (contains a component of odd size), then print -1. Otherwise, while the graph is good, pop the highest weight edge from the pq. When the graph becomes bad, we know the answer is the weight of the last edge that was removed. Now this edge might be useful in later queries, so it should be added back.

Code: 190138118

By the way, I just pop twice the number of elements needed to pop the max before sorting them. I was in a hurry when implementing this (in a duel) so I didn't care about some extra log factors.

There should be no extra log factor due to sorting as long as $$$T(n) = \Omega(\log n)$$$, which is true in the case of DSU.

I'm very interested to know if this alternative stopping condition guarantees a similar complexity. Intuitively it seems like it should.

Let $$$T'[l, r]$$$ be the time the algorithm spends moving elements (in the worst case) during time $$$[l, r)$$$ which were also pushed at some time in $$$[l, r)$$$.

So the number of updates my alternative condition causes should be $$$T'[0, n]$$$.

where $f'(l, m, r)$ is the number of times elements pushed during $$$[l, m)$$$ get moved during $$$[m, r)$$$ (in the worst case). It is clear that only $$$(r-l)$$$ matters here, so we can rewrite as

Where $T[n]$ is the number of times elements can get moved during a sequence of $$$n$$$ events, if we ignore moving elements that were already present, and $$$f(n)$$$ is the number of times elements pushed during the first $$$n/2$$$ events can get moved during the last $$$n/2$$$ events.

If we can show $$$f(n) \in O(n)$$$, then $$$T[n] \in O(n \log n)$$$ ($$$O(\log n)$$$ levels with width $$$n$$$). So the rest of the argument will be analysis of the value of $$$f(n)$$$.

When some value from time $$$[n/2, n)$$$ is popped, the number extra elements that get popped is however many elements that were added after it. But if the popping reaches elements from time $$$[0, n/2)$$$, then all the elements already added from $$$[n/2, n)$$$ get sorted, meaning without pushing more elements, future pops from $$$[n/2, n)$$$ are $$$O(1)$$$, or proportional to the number of extra elements added. So we "consume" $$$k$$$ updates from $$$[n/2, n)$$$ to cause $$$O(k)$$$ moves of elements from $$$[0, n/2)$$$. So the sum of moves of elements from $$$[0,n/2)$$$ during popping elements from $$$[n/2,n)$$$ is $$$O(n)$$$.

Then what remains is the number of times elements from $$$[0,n/2)$$$ are moved during popping elements from $$$[0,n/2)$$$. There are $$$2$$$ reasons one might "cut far into" the elements from $$$[0,n/2)$$$ when popping one of the values from there: 1. there were a bunch of elements from $$$[n/2,n)$$$ covering the elements from $$$[0,n/2)$$$. 2. the target element to pop was far into the $$$[0,n/2)$$$ elements

The contribution from the first cause can only be $$$O(n)$$$ for the same reason as the contribution from popping elements from time $$$[n/2, n)$$$. In the second kind, since when some element gets popped from some position, twice the depth to that position gets sorted, future burried elements have to be at least twice the depth. But the depth can not increase further than $$$n/2$$$. So we get the geometric sum $$$1+2+4+8+\dots+n/2$$$ in the worst case, which is also $$$O(n)$$$.

I think there may be something sketchy with this geometric sum. It's wrong because there may be elements from even earlier tangled in with the elements from $$$[0, n/2)$$$ which make the size of the stack larger. But we can correct this mistake by replacing the sum with $$$\min(1,n)+\min(2,n)+\min(4,n)+\dots+\min(N/2,n)+\min(N,n)$$$ where $$$N$$$ is the original $$$n$$$ for the whole problem (and not just the subproblem). But this sum is $$$O(n \log N)$$$, which means the whole algorithm is $$$O(n \log^2 n)$$$. This is however only an upper bound, and I believe it might still be $$$\Theta(n \log n)$$$, but requires a more clever proof.

I think I have a proof that it's $$$\Theta(n \log n)$$$ using the Potential method.

Let $$$n$$$ be the total number of push and pop operations in total. Consider an update $$$U$$$ on the stack. Let $$$V$$$ be the update with the highest priority below $$$U$$$ on the stack (if it exists). Let $$$d$$$ be the depth of $$$V$$$ in the stack (distance from the top). Define the potential of an update $$$\phi(U)$$$ as follows. If $$$V$$$ exists and $$$priority(V) > priority(U),$$$ then $$$\phi(U) := \log_2(n) - \log_2(d).$$$ Otherwise, $$$\phi(U) := 0.$$$

The potential of the current stack is defined as the sum of the potentials of all updates on the stack. Let $$$\Phi(i)$$$ be the potential of the stack after the $$$i$$$-th operation.

Let $$$T(i)$$$ be the time of the $$$i$$$-th operation. Let $$$T_{amortized}(i) = T(i) + C \cdot(\Phi(i) - \Phi(i-1))$$$ We will show that for each operation $$$i$$$, $$$T_{amortized}(i) = O(\log n)$$$. And therefore, $$$\sum_i T(i)=O(n\log n)$$$.

In a push operation, $$$T(i)=1$$$. For all elements currently on the stack, their potentials decrease slightly. For the new element, its potential is at most $$$\log n$$$. So for a push operation, $$$T_{amortized}(i) = O(\log n)$$$.

In a pop operation, suppose that $$$T(i) = 2k$$$. That is, the max element is at depth $$$k$$$ and so it processes the top $$$2k$$$ elements. We will first consider the cost of reordering the elements and then the cost of removing the max element.

First consider the cost of just reordering. Now, for each update above the max element in the stack, its potential decreased by at least $$$1$$$, since the depth of its corresponding update $$$V$$$ will have a depth of at least $$$2k$$$ after the pop. For the updates below the max element, their potentials either stayed the same or decreased. So far, the change of potential is $$$\le -k$$$.

Next, consider the cost of deleting the max element. The potential of the max element decreases to $$$0$$$ because it was deleted. Also, the depths of all elements in the stack decreased by $$$1$$$, which means the potential could have increased. But by how much can it increase? The potential of the update at depth $$$d$$$ is at most $$$\log_2 n - \log_2 d$$$, so the most it can increase is $$$\log_2(d) - \log_2(d-1)$$$. Summing over all elements, the total increase in potential due to the deletion is at most $$$(\log_2(2)-\log_2(1)) + (\log_2(3)-\log_2(2))+\cdots+(\log_2(n)-\log_2(n-1))\le \log_2 n$$$.

Combining these two parts of a pop operation, we get that $$$T_{amortized}(i)\le 2k+C\cdot(-k+\log_2 n)=O(\log n)$$$ as long as $$$C\ge 2$$$.

There's nothing special in this argument about the number $$$2$$$. As long as the number of extra elements is a constant fraction of the number of required elements to pop, the same complexity guarantee applies.