In many data structures, the operation of "undo" on the last update can be implemented easily: we can (usually) maintain a stack of the updates, where each update on the stack holds the memory cells it changed, and their original values. To undo an operation, just revert all changes from the top update on the stack. To maintain good complexity, we require the updates to operate in non-amortized time. I've seen this being used multiple times on DSU (without path compression).

If we imagine the updates as a sequence, then we can push an update to the end, and pop an update from the end by undo. Then, this sequence is a stack of updates. Here we discuss the idea of having a queue of updates: we can add a new update, or undo the oldest update still active. Specifically, this blogpost attempts to solve the following problem:

Given a data structure that can support some updates, some queries and an "undo" operation (each with their own time complexity), how can we create a data structure that supports the same updates and queries, but its undo operation works like a queue (and we don't support a stack-like undo in the new one)? And of course, creating such a data structure that adds the smallest time complexity factor we can.

Well, this is almost what this post discusses; I require the DS to have another property, which I'll call "commutativity". The best way to put it is probably: A data structure is commutative if for any sequence of updates, and a query afterwards, the result we want from the query remains the same, regardless of how we order those updates. This property holds in DSU (which was the motivation for this post): For any sequence of adding edges, after which we query whether two nodes are in the same component, the query result doesn't change for any order of edge additions. However, if the DSU query asks for the specific root of the node we query on, then this result depends on the order of edge additions, and also on the implementation. I think this query is less interesting anyway.

A data structure this won't work on is, for example, a segment tree with the update of "set in range", since the order of updates matters. Perhaps this specific DS can support queue undoing with a different algorithm, but here we discuss a general commutative data structure.

A few more notes: We build a DS from a commutative DS. We don't require any other constraint on the original DS (we don't care if its complexities are amortized, this includes the complexity of its undo operation). The data structure we create will work in **amortized** time, and it will also be **online** (which was a main purpose). Finally, I've never heard about this anywhere, yet I wouldn't be surprised if this idea is well known in China.

## The Idea

Let $$$S$$$ be the stack of updates, which is initially empty. We begin by receiving updates which fill up $$$S$$$, until the first undo. What we do now, is reversing $$$S$$$: we undo all the updates and redo them in reverse order. We use commutativity here which ensures this reversal is fine. Now, to undo the first operation, we just call the undo operation on the original data structure: it pops the topmost element of $$$S$$$ which is the first update. In an ideal world, we'd hope that we only receive undo's until the stack is empty, but it is probably the case that we'll get both new updates and undoing: We must push new updates into $$$S$$$, but also must be able to pop old updates, so we have to somehow interleave new updates and old updates. We'll call this "The Stack Problem", which is the main roadblock.

## The Stack Problem

We'll consider the stack problem in the following way: we have a stack $$$S$$$ of $$$n$$$ letters $$$A$$$, which represent old updates we need to pop (even while denoted the same, these $$$A$$$'s are different updates... but we don't care about their difference, we just need to remove them in order from topmost to bottommost). On $$$S$$$ we can get 2 types of operations: either pop an $$$A$$$, or add a $$$B$$$, which represents a new update. We will assume there are a total of $$$2n$$$ operations; $$$n$$$ which pop $$$A$$$ and $$$n$$$ which add $$$B$$$, and these arrive online. What we really care about is the number of $$$A$$$'s and the number of $$$B$$$'s in $$$S$$$: when we're asked to pop an $$$A$$$, we modify $$$S$$$ so that the number of $$$A$$$'s decreases by 1, and symmetrically for adding a $$$B$$$. At the end we should have a stack of only $$$B$$$'s.

You might have noticed we assume there will be exactly $$$n$$$ new updates before we pop exactly $$$n$$$ old updates. This is just to ease on solving the stack problem, and we'll later see this assumption makes no asymptotic difference.

We assume that an operation of push or pop on $$$S$$$ take $$$\mathcal{O}(1)$$$, since we care for now about minimizing stack operations. If we solve this problem in time complexity $$$\mathcal{O}(n \cdot f(n))$$$, consider this as duplicating every update and undo, $$$\mathcal{O}(f(n))$$$ times. Following is an algorithm in $$$\mathcal{O}(n \log n)$$$ time.

## Solving The Stack Problem

This will be our algorithm:

- When provided a $$$B$$$ update (add B), we just push it to the top of $$$S$$$.
- When provided an $$$A$$$ update: If $$$S$$$ already had $$$A$$$ on top, we just pop. Otherwise, we begin the following process (which I'll call "fixing"): pop from $$$S$$$ and save all the elements we popped, until we popped an equal amount of $$$A$$$'s and $$$B$$$'s, or until no $$$A$$$'s remain in the stack; empty stack or only $$$B$$$'s remain (we can keep an index of this position in the stack, which will only increase). Then, push back all the elements we popped, where we first push all $$$B$$$'s, then all $$$A$$$'s (we use commutativity here). Since the top of $$$S$$$ had a $$$B$$$, and we were asked to pop an existing $$$A$$$, after fixing, the topmost element will be an $$$A$$$ which we pop.

Let's show an upperbound of $$$\mathcal{O}(n \log n)$$$ time on this algorithm. This actually might be interesting for those who want to try, before opening the spoiler:

**Upperbound**

But, can this algorithm be better than $$$\mathcal{O}(n \log n)$$$ in worstcase? Well... no. In the case where the operations are $$$BABA...BA$$$ (alternating between adding $$$B$$$ and popping $$$A$$$, we'll return to this case later), this algorithm takes $$$\Theta(n \log n)$$$ time (the sequence of positive fixing costs is 1, 2, 1, 4, 1, 2, 1, 8, ..., basically cost $$$i$$$ equals to the least significant bit of $$$i$$$).

## The Idea, Cont.

As mentioned, we begin by pushing updates into $$$S$$$ until the first undo operation. Suppose at this time, the number of updates in $$$S$$$ was $$$s_1$$$. So we did $$$s_1$$$ updates, then we do $$$s_1$$$ undo's (on the original structure), and insert these back in reverse order, which is another $$$s_1$$$ updates (asymptotically the same). Now we execute our algorithm for the stack problem, as long as the last of the first $$$s_1$$$ updates still exists. Suppose once we undo the last of those, $$$S$$$ contains $$$s_2$$$ new updates.

Even though $$$s_1, s_2$$$ might not be equal, the proposed algorithm is still well defined. As for its complexity, we can give a simple upperbound that is tight on the worstcase: if we denote $$$n = s_1 + s_2$$$, any sequence of operations that begins with $$$s_1$$$ $$$A$$$'s and ends with $$$s_2$$$ $$$B$$$'s, is a prefix of some sequence of operations where we start with $$$n$$$ $$$A$$$'s and end with $$$n$$$ $$$B$$$'s, which takes $$$\mathcal{O}(n \log n)$$$, so we can bound by $$$\mathcal{O}((s_1 + s_2) \log (s_1 + s_2))$$$, tight when $$$s_1 = s_2$$$.

Anyway, once $$$S$$$ contains these $$$s_2$$$ new updates (which we can also assume are ordered by the order we received them), we can again reverse $$$S$$$, and begin our algorithm again until we get $$$s_3$$$ new updates, and so on.

If we had a total of $$$U$$$ updates (and upto $$$U$$$ undo's between them) asked of us, and we had $$$k$$$ "phases" of the algorithm ($$$s_1$$$ through $$$s_k$$$), then $$$U = \sum_{i=1}^{k}s_i$$$, and our time complexity in terms of push and pop into $$$S$$$:

- Across all reversals of $$$S$$$ we take $$$\mathcal{O}(\sum_{i=1}^{k}s_i) = \mathcal{O}(U)$$$.
- Across all executions of the algorithm we take $$$\mathcal{O}(\sum_{i=1}^{k-1}(s_i + s_{i+1}) \log (s_i + s_{i+1}))$$$, bounded by $$$\mathcal{O}(U \log U)$$$.

In total we take $$$\mathcal{O}(U \log U)$$$ operations on $$$S$$$: in other words, if our original data structure operated in $$$\mathcal{O}(f(U))$$$ for any sequence of $$$U$$$ operations (including undo's), then we support queue-undo's for any sequence of $$$U$$$ operations in time $$$\mathcal{O}(f(U \log U))$$$, which in most cases would imply we add a logarithmic factor to our total complexity.

## Can We Do Better?

A logarithmic factor is small, but not constant. I suppose usually our original structure would have its updates run in $$$\mathcal{O}(\log n)$$$, which now becomes $$$\mathcal{O}(\log n \cdot \log U)$$$, and that's not too small...

So, can we do better than an $$$\mathcal{O}(\log U)$$$ factor? Well, for a general commutative DS (without any more assumptions), the answer is a sad no.

If we only assume a general commutative DS, we can't do any funny tricks; after each update or undo, the stack of updates of the original DS must hold exactly all the active updates. The following attempts to prove that we cannot achieve all these states of the stack in $$$\mathcal{o}(U \log U)$$$ stack operations. **Some parts of this might be informal, and if it fails there I would happily hear someone point this out... I want a better factor.**

First, let's define a similar problem, which we'll conveniently name $$$P$$$: Given a positive integer $$$n$$$ and an empty stack $$$S$$$, all operations we're able to do are push and pop $$$A$$$'s and $$$B$$$'s into $$$S$$$. Define the state of the stack as a pair (number of A's, number of B's). Then our goal is to pass through states $$$(n, 0), (n-1, 1), (n-2, 2), \dots, (0, n)$$$, in this order, while minimizing the number of operations. (note that for problem $$$P$$$ of order $$$n$$$ we have $$$n+1$$$ states... don't get confused with off by 1's).

We recursively define the optimal algorithm for problem $$$P$$$: Let $$$T(n)$$$ be the minimum number of operations required for problem $$$P$$$ of order $$$n$$$, with $$$T(0) = 0$$$.

A few observations on the optimal algorithm:

- It must begin by pushing $$$A$$$, since the first state we need to visit is $$$(n, 0)$$$, so if we were to push a $$$B$$$ at the start, we would end up popping it without reaching any state.
- At some moment, this $$$A$$$ at the bottom of $$$S$$$ must be replaced with $$$B$$$ since we must visit $$$(0, n)$$$ at the end. Informally I claim, that once we put the $$$B$$$ at the bottom, we never pop it again. A formal proof can probably use some exchange argument; we visited some states after we put the first $$$B$$$ at the bottom, then some more states after the second $$$A$$$ at the bottom and at some point we put a $$$B$$$ at the bottom again. So all the states visited by the first $$$B$$$ could be visited by the first $$$A$$$ with less operations, as we merge all states visited by the first $$$A$$$, first $$$B$$$ and second $$$A$$$.

So the optimal algorithm is split to two phases; an $$$A$$$ at the bottom and a $$$B$$$ at the bottom. This implies there exists some $$$1 \leq k \leq n$$$ such that we visit the first $$$k$$$ states in the first phase, and the rest $$$n+1-k$$$ states in the second phase. We can analyze how the algorithm works for any $$$k$$$ and choose the best $$$k$$$:

Now it's informal, but can probably be proven with an ugly exchange argument; for a fixed $$$k$$$, what the algorithm does is:

- At the beginning, push $$$n-k+1$$$ $$$A$$$'s into $$$S$$$.
- From this point, recursively solve a subproblem of order $$$k-1$$$, which takes $$$T(k-1)$$$ operations.
- Pop the entire stack, which is $$$n$$$ operations.
- Push $$$k$$$ $$$B$$$'s into $$$S$$$.
- From this point, recursively solve a subproblem of order $$$n-k$$$, which takes $$$T(n-k)$$$ operations.

So this running time is $$$2n+1 + T(k-1) + T(n-k)$$$. minimizing across all $$$k$$$, we get the equation:

This achieves minimum when $$$k = \frac{n}{2}$$$, resulting in $$$T(n) = \Theta(n \log n)$$$, so the optimal running time for $$$P$$$ is $$$\Theta(n \log n)$$$.

Back to our original problem; Let $$$A$$$ be an algorithm that can handle $$$n$$$ updates and $$$n$$$ undo's in running time $$$\mathcal{O}(f(n))$$$. Specifically, the following scenario is also solved in $$$\mathcal{O}(f(n))$$$: push $$$\frac{n}{2}$$$ updates, then alternate between pushing an update and undoing an update.

(Informally?) If no further assumptions are made on the updates, we must be able to operate on the stack of updates $$$S$$$, so that after each update or undo, exactly those updates that are active, should be in $$$S$$$. Now one can see that, if we name the first $$$\frac{n}{2}$$$ updates as $$$A$$$'s and the new ones as $$$B$$$'s, we pass through states $$$(\frac{n}{2}, 0), \dots, (0, \frac{n}{2})$$$. So algorithm $$$A$$$ can generate a solution to problem $$$P$$$ of order $$$\frac{n}{2}$$$ in running time $$$\mathcal{O}(n + f(n)) = \mathcal{O}(f(n))$$$ (we add $$$n$$$ since in problem $$$P$$$ we begin with an empty stack).

This implies we can solve $$$P$$$ of order $$$\frac{n}{2}$$$ in $$$\mathcal{O}(f(n))$$$ operations, but by the optimality we've shown on $$$P$$$, we get $$$f(n) = \Omega(n \log n)$$$.

## Open Questions

Although we showed some kind of optimality in a general case, there are still open questions for improvement:

- If the data structure supports different updates in different complexities, can we prioritize some updates over others? Our algorithm does overall $$$\mathcal{O}(U \log U)$$$ stack operations, but it could be that some updates get called many times and some get called only a few times. Perhaps a weight function on each update in the stack, and a variant of the fixing process?
- Can a similar algorithm be applied with less assumptions (for instance, no commutativity)?
- Can a similar algorithm be applied with more assumptions, and better running time? Perhaps the same algorithm can be proven to have better running time on specific scenarios?

I've yet to try answering any of these.

## Examples

My motivation problem is supporting the queue of updates on DSU (LCT probably defeats this purpose with even better complexity, so I was looking for something more general and simple). We can use it to solve this problem online in running time $$$\mathcal{O}(n \log n \log q)$$$. There also exist an offline D&C algorithm where we have monotonicity (corresponds to queue undoing), and a cool offline algorithm here where we compute for each update its "living time", split these intervals across a segment tree and perform a DFS on it (this is offline, but it doesn't even require the undoing to be in any form of stack or queue).

If I had to come up with another problem... Given an array of length $$$n$$$ of integers, an update provides $$$(l, r, x)$$$ to set all $$$A_i = \min(A_i, x)$$$ for $$$l \leq i \leq r$$$, and a query could ask for maximum in range. These updates are commutative so we could apply the idea on a lazy segment tree.

Another example could be 1386C, from BOI. Here it is interesting, because we can use the idea to solve the problem, even though we don't explicitly take a DS and modify it; we can add all edges from $$$i = 1$$$ to $$$i = n$$$, then proceed by either undoing an update from the suffix, or doing an update from the prefix.

I haven't seen many instances where this can be applied, but it's probably not difficult to come up with such. If you remember problems that can utilize this, link them in the comments.

## Implementation

I was wondering if we can have a C++ generic implementation, probably using inheritance, but I don't know enough of it to come up with such. Can you find any? Maybe in another language?

Anyway, here is an implementation (of DSU-queue), solving 1386C in time $$$\mathcal{O}(m \log n \log m)$$$. Hovering over all tests, the worstcase running time was 358ms, which is actually faster than I expected. Either the constant is small or there happens to be no testcase that hits us hard (like some alternating case).

Edit: bicsi has another implementation here, using a few clean tricks to cut down implementation size. This implementation also removes some amortization, explained in his comment here.

Thanks for reading :)

Very amazing blog!

Thank you!

It's been pointed out that I should give this trick a name. Does anybody have a name better than "Queue Undo Trick"?

You could just name it after an anime you like as segtree beats.

To maximize absurdity and stupidity of this idea, let's call it Boku no Pico trick.

How about "euphoria" trick?I mean it is absurd , stupid and ..useful? That's it,I need to wash my eyes with soap water now

Really nice blog! I think it's common to use two stacks as a queue but I have never seen anyone use just one!

However

I don't understand this as one element could be pushed and poped many times.

Suppose we have $$$n-1$$$ 'light' updates, maybe $$$O(1)$$$ to update and undo, one pop operation so everything becomes $$$A$$$. Then one 'heavy' update, maybe $$$O(n)$$$ to update and undo, and $$$n-1$$$ pop's. The 'heavy' update/undo will be called $$$O(n)$$$ times.

What I was trying to express is that, regardless of whether the original data structure worked in amortized time or not, we still apply to it a sequence of $$$\mathcal{O}(U \log U)$$$ operations.

Basically, we're an algorithm outside the data structure that just calls a bit more updates than what we were given. So for example, if the original data structure worked in amortized time of $$$\mathcal{O}(q \log q)$$$ for any sequence of $$$q$$$ operations, then we would make it work in $$$\mathcal{O}(q \log^2 q)$$$ for any sequence of $$$q$$$ operations.

As for your example, if there exists a scenario with a heavy update that takes $$$\mathcal{O}(n)$$$ each time it is called, and our algorithm just so happens to call it $$$\mathcal{O}(n)$$$ time, then even without this algorithm, the original data structure is not very efficient; any user can call that heavy update/undo $$$\mathcal{O}(n)$$$ times.

If you're wondering about what we should do in case some updates consume more time than others, I put it at the "Open Questions" sections, as I still haven't thought about it.

Hope this clarifies.

One example I can think of is DSU with just path compression. It depends a bit on the implementation(p[a] = b or p[b] = a) but we can create merge's so that, after reversing the first $$$n-1$$$ of the queue, we have a bamboo and an isolated node. The merge's going up the stack(after reversing) will always be merge(root of bamboo, node).

The last update would be merge(leaf of the bamboo, isolated node) that takes $$$O(n)$$$ to update/undo. It will be $$$O(n)$$$ for every call only because of the undo operation.

If I understand correctly your claim of "$$$\mathcal{O}(q \log^2 q)$$$ for any sequence of $$$q$$$ operations" is false as we can repeat the amortized update many times(so it doesn't amortize) because of the undo.

In the case you've provided, regardless of my blogpost, DSU with just path compression has a bad time complexity; any user of your data structure can first merge $$$n-1$$$ times, undo all of them and execute them in reverse order. Then for $$$n$$$ more times, the user can merge and undo, thus achieving $$$\mathcal{O}(n^2)$$$ time.

So for your proposed DS, it takes upto $$$\mathcal{O}(n^2)$$$ time for any sequence of $$$n$$$ operations, thus the new DS takes upto $$$\mathcal{O}(n^2 \log n)$$$ time for any sequence of $$$n$$$ operations.

Oh! So you required good complexity for updates

and undo's. I was thinking of just good complexity for updates. It makes sense now.Indeed, because the proposed algorithm is already provided a data structure that can support normal undo's, so its complexity is already taken into account.

Very nice trick! Powerful and simple to understand.

Very nice blog!

One tiny observation: you don't actually need to keep the bottom pointer. Not keeping count of it does not actually make the asymptotical complexity any worse (it might make the solution a bit less efficient though).

Alternative implementation to 1386C (might be a bit cleaner): https://codeforces.com/contest/1386/submission/95927892

Thank you! And your implementation really is nicer.

I think I have a solution that only does $$$O(log(n))$$$ push/pop operations on each element, essentially answering your first open question. The solution is as follows:

Let's consider blocks of contiguous identical updates (we'll call them type-A blocks and type-B blocks).

For type A operations (red boxes), we'll restrict ourselves to blocks that are power of two and decreasing from left to right.Initially, it would look something like Image 1. We will keep this invariant all throughout the algorithm. For type B operations (blue boxes), we don't impose such restriction.For a push operation, we do nothing special. For a pop operation, we will pop the rightmost element if it is red. If it is blue, we will effectively swap the blue block with the red block that is to its left. See Image 2, freeing up some red items for popping.

Notice that if the red block that comes now to the right has length bigger than 1, it gets split into smaller blocks. For example, after some other pop operation, we will have something like Image 3. Notice that the split is just conceptual, but it affects the strategy in the future, and is essential to the proof.

Now, let's prove that this strategy does $$$O(log(N))$$$ undos / updates on each element. For a red element, whenever it gets undo'ed and redo'ed, it will be for a pop operation in its red box, therefore the box it would be in after the pop operation will have a size that is at most half the original size. For a blue element, notice the blue arrow in the images pointing to the box immediately to its left. Whenever it gets undo'ed and redo'ed, the blue arrow of the block will point to a red block that has at least twice the size of the old red block. We therefore conclude that each blue element and each red element can only be undo'ed and redo'ed a maximum of $$$O(log(n))$$$ times before the queue gets cleared.

EDIT: Check my implementation (in particular, check the

`push`

and`pop`

procedures)This is a really nice way to look at it and makes the complexity analysis much more intuitive IMO.

(Sorry for necroposting)

I don't believe your complexity analysis. Specifically, your claim that each blue element is pushed/popped at most $$$O(\log n)$$$ times. For example if the stack looks like $$$BABABA\ldots BA$$$ and you repeatedly do pop operations with your strategy, doesn't it do $$$O(n^2)$$$ stack operations? The last block of $$$B$$$'s grows after each pop, and each time you only move a single $$$A$$$ to the top of the stack.

Update: I was wrong, it's not possible for the stack to look like my example. I didn't realize all of the red blocks on the stack had decreasing sizes. I thought it was only claimed that the blocks in a contiguous section of A's have decreasing sizes.

This trick can be used to solve 1442D - Sum. The data structure is a knapsack that can add an element and undo in $$$O(k)$$$ time. We can obtain the knapsack of all but one element for each element using the queue undo trick.

It's worth mentioning that, because all "updates" are offline, and we know from the start exactly how the updates and undo's are ordered, then there's also a D&C solution with equal complexity (but probably shorter code).

Still, I'm happy to hear that.

really amazing approach！