Good morning!
In this blog, I will present an online algorithm that can perform priority-queue-like undoing on a data structure, given a few assumptions. I will also present and solve a generalization of that problem.
For context, I highly recommend reading this blog that shows how to solve the easier problem of queue-like undoing. That blog gave me the inspiration for the trick I will describe here, but I will still try to write this blog so that it can be understood without it. Thank you Noam527 for introducing such a great trick to the competitive programming community.
[Tutorial] Supporting Queue-like Undoing on DS
Acknowledgment
Huge thanks go to peltorator for hosting a blog contest. His generous prize of $300 got many people to write blogs on new, interesting ideas, and it was one of the main reasons I wrote this blog. I'm honored to win the prize, so thanks again to peltorator!
Problem Statement
Let $$$D$$$ be a data structure that supports updates and queries. We require $$$D$$$ to be commutative, meaning that if we apply a set of updates and then a query, the answer to the query will be the same regardless of the order that we applied the updates. Let's assume that if we apply a sequence of $$$n$$$ updates to $$$D$$$, then each update will take $$$O(T(n))$$$ non-amortized time. Let's also assume that we have a way to undo the most recent update in $$$O(T(n))$$$ time (this can usually be supported by maintaining a stack of changes we've made to the internal state of $$$D$$$, and undoing the changes on the top of the stack).
Now, we would like to the solve the problem of supporting the following types of operations online:
- Apply an update to $$$D$$$ with an associated priority $$$p$$$. (Assume all priorities are unique).
- Undo the update in $$$D$$$ that has the highest priority. (Recall that the updates are commutative).
- Answer a query about $$$D$$$.
Given a sequence of $$$n$$$ updates and queries, we will solve this problem in $$$O(n T(n) \log n)$$$ time. In other words, each operation takes $$$O(T(n)\log n)$$$ amortized time.
Subproblems
There are a few special cases of the above problem that we get for free.
First, if the priorities are inserted in decreasing order, this is the same as the queue-like undoing problem described in Noam's blog.
Second, if we remove the online requirement, then we can support arbitrary deletions in the same time complexity. This is because we can pre-compute priorities that reduces the problem to the one described above. We simply say that the priority of an update is inversely correlated to its deletion time. This subproblem of offline deletion can also be solved using a different well-known trick.
Idea
The idea is that we have a stack $$$S$$$ of updates in the order that we applied them to $$$D$$$. If the highest priority update coincides with the top of the stack, that's great, we can just pop it and undo the most recent update to $$$D$$$. However if we need to undo an update that's buried deep within the stack, we need to spend a lot of work undoing all of the updates that come after it, and then pushing them back onto the stack.
Fortunately, we have some flexibility that can help us save time. Firstly, we can decide to push the updates back onto the stack in a different order than they were before. This can help us because we can choose to put higher priority updates closer to the top of the stack, which saves us time later on. Secondly, we can choose to pop even more elements than required for the current undo operation. It can sometimes help to dig deeper into the stack because it can make up for the work we've already done. For example, suppose we've already popped a lot of elements to get the highest priority update. If the second highest priority update is close behind it, it might be worth popping it out as well so that it will be more accessible later.
Solution
In this solution, the operations of applying new updates and answering queries will be handled in a straightforward manner. When we apply an update we push it directly to the stack and apply it to $$$D$$$. When we answer a query, we don't touch the stack at all.
Then when we need to undo the highest priority update, we need to decide the number of updates to pop from the stack. Let's choose the smallest number $$$k\ge 1$$$ such that the following condition holds: The $$$\lceil\frac{k}{2}\rceil$$$ highest priority updates in the stack are among the top $$$k$$$ positions of the stack.
After we pop these $$$k$$$ updates, we push them back onto the stack in such a way that the $$$\lceil\frac{k}{2}\rceil$$$ highest priority updates are all on top of the stack, sorted in increasing order.
Details
Now, it's a bit of a non-trivial detail to efficiently check if the desired condition has been met. The proof of complexity will rely on the fact that processing $$$k$$$ updates on the stack takes $$$O(\log n + k T(n))$$$ time. To make the result as general as possible, we want to support the case where $$$T(n)=1$$$, so our overhead costs should be $$$O(\log n + k)$$$. This means we can't afford to do something such as sorting the top $$$k$$$ elements by priority.
In order to satisfy this requirement, let's say that in addition to the stack of updates we also maintain a set data structure of the updates, ordered by priority. (Such as std::set
in C++). When we add an update, it takes $$$O(\log n)$$$ time to insert into the set, and when we pop an update, it takes $$$O(\log n)$$$ time to delete it from the set. In the undo operation, as we iterate $$$k$$$ in increasing order, we simultaneously iterate through the $$$\lceil\frac{k}{2}\rceil$$$ elements of the set with the highest priority. Let's say that the stack-depth of an update is the number of updates currently above it on the stack. As we iterate through the set, we keep track of the largest stack-depth of those elements, and compare with $$$k$$$. If $$$k$$$ is larger than the largest stack-depth, the condition is met. Otherwise, we need to continue iterating $$$k$$$.
After we decide to stop iterating, we can efficiently push all the elements back onto the stack. First we can push all the elements that are not among the highest $$$\lceil \frac{k}{2}\rceil$$$ priorities. We can push them in the same order that they originally appeared on the stack, for example. (Remember we can't afford to sort them.) Then, we push the remaining elements in increasing order of priority. That can be done efficiently by iterating through the top elements of the set, and doesn't require us to sort from scratch.
And so, we can see that processing $$$k$$$ elements of the stack can be done in $$$O(\log n + k T(n))$$$ time, which is good enough for the proof of time complexity of the entire algorithm.
Complexity Analysis
Let's say that the cost of an operation is the number of stack elements processed. So, a push operation has a cost of $$$1$$$ and an undo operation has a cost of $$$k$$$, where $$$k$$$ is the number of elements of the stack popped in that operation. We will ignore query operations. If $$$C_i$$$ is the cost of the $$$i$$$-th operation, then we will show that $$$\sum_{i=1}^n C_i = O(n\log n)$$$. Note that this is sufficient to show that the overall time complexity is
$$$\sum_{i=1}^n (\log n + C_i T(n)) = n\log n + T(n) \sum_{i=1}^n C_i = O(n T(n)\log n).$$$Now we will show that for any two indices $$$i<j$$$, we have $$$2(j-i)\ge \min(C_i, C_j).$$$ This is clearly true if either operation $$$i$$$ or $$$j$$$ is a push operation, so let's assume that they are both undo operations. Assume for contradiction that the inequality does not hold, and let $$$i$$$, $$$j$$$ be a counterexample where $$$j-i$$$ is the smallest possible.
Let $$$k=\min(C_i,C_j)$$$. Since $$$k\le C_i$$$, we know that after operation $$$i$$$, the $$$\lceil \frac{k}{2}\rceil$$$ highest priority updates are on top of the stack in increasing order. Let $$$U$$$ be the set of those updates. Suppose that between operations $$$i$$$ and $$$j$$$ there are $$$m_1$$$ push operations and $$$m_2$$$ pop operations. So, there are at most $$$m_1$$$ new elements and at least $$$\lceil\frac{k}{2}\rceil-m_2$$$ elements remaining from $$$U$$$ at the time before operation $$$j$$$.
Since $$$C_j\ge k$$$, we know that either it stopped before it passed over all the remaining elements of $$$U$$$ when processing the stack, or the condition was not satisfied the first time it encountered all the remaining elements of $$$U$$$. In the first case, there needed to be at least $$$\lceil\frac{k}{2}\rceil$$$ additions between $$$i$$$ and $$$j$$$, so $$$j-i\ge \lceil\frac{k}{2}\rceil$$$ which contradicts our assumption. In the second case, this means that $$$\lceil\frac{k}{2}\rceil-m_2<m_1$$$, otherwise the condition would have been satisfied. But then we have that
$$$j-i>m_1+m_2>\left\lceil\frac{k}{2}\right\rceil \ge \frac12 \min(C_i, C_j)$$$contradicting our assumption. Therefore, the inequality always holds.
Let $$$y_c$$$ be the number of operations $$$i$$$ such that $$$C_i\ge c$$$. The above fact shows that more expensive operations are farther apart, so we know that $$$y_c\le 1+\frac{2n}{c}$$$. Finally, we have that
$$$\sum_{i=1}^n C_i = \sum_{c=1}^n y_c\le \sum_{c=1}^n (1+\frac{2n}{c}) = n + 2n \sum_{c=1}^n \frac1c = O(n\log n).$$$Generalization to Multiple Priorities
Although priority-queue-like undoing is pretty cool, I realized it's still not general enough to support deque-like undoing, where we can undo either the oldest update or the most recent update. More generally, what if priorities are $$$d$$$-dimensional vectors, and we want to undo the update that has the largest priority in a specified dimension? That is, we would like to support the following operations:
- Apply an update with associated priority vector $$$(p_1,\ldots, p_d)$$$.
- Undo the update whose priority vector has the largest value in dimension $$$i$$$.
Note that this can simulate deque-like undoing because we can use priorities $$$(p, -p)$$$.
Solution
We can support this generalization in the following way. At the end of an undo operation, push elements to the stack in a cyclic manner. For example if there are three dimensions $$$x$$$, $$$y$$$, $$$z$$$, then we make sure the element with the largest $$$x$$$ value is on top, then of the remaining elements make sure the element with the largest $$$y$$$ value is on top, then $$$z$$$, then $$$x$$$, and so on.
When we pop elements, we stop when the following condition is met: For each dimension $$$i=1\ldots d$$$, the $$$\lceil \alpha k\rceil$$$ elements with the highest priorities in dimension $$$i$$$ are among the top $$$k$$$ positions of the stack. Here, $$$\alpha$$$ is some constant between $$$0$$$ and $$$\frac1d$$$ will be decided later. For the implementation detail, we can maintain $$$d$$$ different set data structures, with the $$$i$$$-th one sorting the updates according to the $$$i$$$-th dimension.
Complexity Analysis
The complexity analysis is very similar; all we have to show is that there is some value $$$\beta$$$ such that for any two operations $$$i<j$$$, we have $$$\beta(j-i)\ge \min(C_i, C_j)$$$. Then we can guarantee that the complexity is $$$O(\beta n T(n) \log n)$$$.
Suppose that $$$C_i\ge k$$$. Then for each dimension, the largest $$$\alpha k$$$ elements of that dimension are present among the top $$$d\alpha k$$$ positions of the stack after operation $$$i$$$. Let $$$U$$$ be the set of those $$$d\alpha k$$$ elements. Let's say we apply $$$m_1$$$ additions and $$$m_2$$$ deletions between $$$i$$$ and $$$j$$$.
Since $$$C_j\ge k$$$, we know that either it stopped before it passed over all the remaining elements of $$$U$$$ when processing the stack, or the condition was not satisfied the first time it encountered all the remaining elements of $$$U$$$. In the first case, there needed to be at least $$$(1-d\alpha)k$$$ additions between $$$i$$$ and $$$j$$$, so $$$(j-i)\ge (1-d\alpha)k$$$.
In the second case, assume without loss of generality that the condition failed on the first dimension when it first encountered the remaining elements of $$$U$$$. There are at least $$$\alpha k-m_2$$$ remaining elements from $$$U$$$ with the highest priorities in dimension $$$1$$$ and at most $$$d\alpha k + m_1-m_2$$$ overall elements at the time the condition fails. Therefore,
$$$\alpha k - m_2<\alpha(d\alpha k + m_1-m_2).$$$Let $$$m=m_1+m_2$$$ and $$$\lambda=\frac{m_1}{m}$$$ be the proportion of additions out of the operations. If we replace $$$m_1=\lambda m$$$ and $$$m_2=(1-\lambda)m$$$ and rearrange, we get
$$$\frac{m}{k} > \frac{\alpha - d\alpha^2}{\alpha \lambda + (1-\alpha) (1-\lambda)}.$$$Let's define a function $$$f(\alpha, \lambda)$$$ to be the right-hand side of that inequality. The worst case is when $$$f(\alpha, \lambda)$$$ is small because it gives us the weakest lower bound for $$$m$$$. Note that the denominator is a convex combination of $$$\alpha$$$ and $$$(1-\alpha)$$$. So depending on which is larger, the worst case is either $$$\lambda=0$$$ or $$$\lambda=1$$$. Therefore, if we set
$$$\beta=\frac{1}{\min(1-d\alpha, f(\alpha, 0), f(\alpha, 1))},$$$then the above claim holds.
This means that if we treat $$$d$$$ as a constant, the complexity is $$$O(nT(n)\log n)$$$.
Optimizing $$$\alpha$$$
The complexity analysis above is somewhat incomplete, because we haven't seen how to choose an optimal $$$\alpha$$$ value or how the complexity scales up with $$$d$$$. To get the best guarantee of complexity, we want $$$\beta$$$ to be as small as possible, or $$$\frac{1}{\beta}=\min(1-d\alpha, f(\alpha, 0), f(\alpha, 1))$$$ to be as large as possible.
First, consider the case where $$$d=1$$$. In this case, $$$f(\alpha, 0)=\alpha$$$ and $$$f(\alpha, 1)=1-\alpha$$$. We can easily see that the optimal value of $$$\min(\alpha, 1-\alpha)$$$ happens at $$$\alpha_{opt}=\frac12$$$.
Now let's consider $$$d\ge 2$$$. We have that $$$\alpha<\frac1d\le\frac12\le 1-\frac1d<1-\alpha$$$ which means that $$$f(\alpha, 0)\le f(\alpha, 1)$$$. It's also always true that $$$f(\alpha, 0)\le 1-d\alpha$$$. So, we really just need to maximize $$$f(\alpha, 0)$$$ over $$$0<\alpha<\frac1d$$$.
With some calculus, we find that the optimal value is
$$$\alpha_{opt}=1-\sqrt{1-\frac1d}.$$$Also with the help of my friend Mr. Wolfram Alpha, we find that
$$$\lim_{d\to\infty}\frac{1}{d\ f(\alpha_{opt}, 0)}=4.$$$This means that $$$\beta=O(d)$$$ and so our final complexity becomes $$$O(dnT(n)\log n)$$$.
As a mini-disclaimer, I'll note that the optimized $$$\alpha$$$ values are only theoretical and a different value might be better in a practical implementation. With that out of the way, here is a reference table of approximate values for a few dimensions. It would be interesting to see how they compare on a benchmark implementation.
$$$d$$$ | $$$\alpha_{opt}$$$ | $$$\beta$$$ |
$$$1$$$ | $$$0.5$$$ | $$$2$$$ |
$$$2$$$ | $$$0.292893$$$ | $$$5.828427$$$ |
$$$3$$$ | $$$0.183503$$$ | $$$9.898979$$$ |
$$$4$$$ | $$$0.133975$$$ | $$$13.928203$$$ |
$$$5$$$ | $$$0.105573$$$ | $$$17.944272$$$ |
Thanks for coming to my TED talk.