NovusStellachan's blog

By NovusStellachan, history, 2 weeks ago, In English

Ok, here's the formal statement:

We have some array of integers $$$a_1, a_2, \dots, a_n$$$ with $$$n$$$ large (around $$$10^5$$$). We have to support the following types of operations:

  • Update: Select some subarray $$$[l, r]$$$ of $$$a$$$ and an arbitrary integer $$$x$$$, and add $$$x$$$ to each of $$$a_i$$$ for $$$l \leq i \leq r$$$. That is, we must be able to perform range addition. Note that $$$x$$$ can be both positive or negative.

  • Query: Count the number of integers at most $$$x$$$ in some subarray $$$[l, r]$$$ of $$$a$$$. Note that $$$x$$$ is allowed to change across queries.

Obviously, the interest is in subquadratic solutions. Fast online solutions (or offline) are interesting to me, and I'd be happy to know about them.

Thank you, NovusStellachan

  • Vote: I like it
  • +28
  • Vote: I do not like it

2 weeks ago, # |
  Vote: I like it -37 Vote: I do not like it

I think you can solve it using segment tree with lazy propogation, when we do selection for the olympiad this type of provlem came to us.

2 weeks ago, # |
  Vote: I like it +3 Vote: I do not like it

Sadly, even if we restrict the query to the subarray $$$[1, n]$$$ and $$$x$$$ to a particular value, say $$$x=0$$$, we believe there is no faster online solution than $$$\tilde{O}(\sqrt{n})$$$ per operation (even amortized). A reduction using Online Matrix Vector Multiplication (OMv) hypothesis can be shown.

Of course, you can easily achieve something like $$$\tilde{O}(\sqrt{n\log{n}})$$$ using sqrt decomposition.

I'm not sure about this at the moment, but probably same reduction can be done to show that you can multiply matrices in $$$\tilde{O}(n^2)$$$, if you can solve this problem (even offline) in polylogarithmic time per operation.

2 weeks ago, # |
  Vote: I like it +10 Vote: I do not like it

I'm only aware of $$$O(n^{1.5}\sqrt{\log{n}})$$$ solution using sqrt decomposition. Split array into $$$k$$$ blocks of size $$$\frac{n} {k} $$$, build over each one smth like explicit treap (i. e. it will store block in sorted order).

To perform range addition $$$(l, r, x)$$$, we can rebuild treaps of first and last blocks in $$$O(\frac{n} {k})$$$, and for treaps of middle blocks we just remember that all values in it are increased by $$$x$$$. Total complexity $$$O(k + \frac{n} {k})$$$.

For the second query we perform lower_bound queries on treaps in the middle in $$$O(\frac{n} {k}\log{k} )$$$, and do straightforward checking of numbers in the endpoints in $$$O(k)$$$.

By choosing $$$k = \sqrt{n \log n} $$$ we get the aforementioned completixy.

  • »
    2 weeks ago, # ^ |
      Vote: I like it +15 Vote: I do not like it

    It can be done even easier, instead of treaps let's just store a sorted list of values. You can rebuild first and last block by merging two sorted lists, one with changed and the other with not changed values.

  • »
    2 weeks ago, # ^ |
    Rev. 2   Vote: I like it +1 Vote: I do not like it

    You do not need a treap, you can simply use ordered sets. For every block $$$B$$$, maintain an ordered set and also an integer variable $$$z$$$ (initially set to $$$0$$$).

    For update $$$[l, r, x]$$$:

    1. If $$$B$$$ lies completely within the range, add $$$-x$$$ to $$$z$$$.
    2. If $$$B$$$ partially lies within the range, rebuild the ordered set and set $$$z$$$ to $$$0$$$ again.

    For queries: Within an ordered set, if the element $$$e$$$ exists, it just means that $$$z + e$$$ is actually present.

2 weeks ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

SQRT-decomposition, divide array into consecutive blocks of $$$B$$$. For each block store whole block increment and sorted version of the block.

If update does not cut the block, just modify block increment. If update cuts the block it must be updated by naively changing affected elements, but instead of copying and sorting to update sorted version use merge on two parts of the block (increasing part of the block will break it in into two sorted parts). This will give $$$O(B + \frac{N}{B})$$$ per update.

To count elements in the whole block use upper_bound(x - block_increment), leftovers can be calculated naively. This will give $$$O(B + \frac{N}{B} \log B)$$$ per count.

With $$$B = \sqrt{N \log B}$$$ we can get $$$O(\sqrt{N \log B})$$$ per any query.

Also naive solution to this problem is very cache-friendly and vectorization-friendly. I think you can get $$$O(N / w)$$$ per query without much effort and that will be very practical solution for small $$$N$$$.

2 weeks ago, # |
Rev. 3   Vote: I like it +16 Vote: I do not like it


2 weeks ago, # |
  Vote: I like it +75 Vote: I do not like it

An $$$O(q \sqrt{n \log n})$$$ solution is straightforward: suppose we divide the sequence into blocks of size $$$B$$$; for each block, we maintain (1) an addition tag and (2) the relative order of the elements inside this block. Then, we can achieve time complexity $$$O(B + \frac{n}{B})$$$ per update (by using merge instead of sort) and $$$O(B + \frac{n \log n}{B}$$$ per query.

To remove the $$$\log$$$ factor, notice that the only bottleneck in the algorithm above is to perform binary search simultaneously on each of the $$$\frac{n}{B}$$$ blocks. This can be solved (online) using a variation of the fractional cascading technique.

Consider a "segment tree" with $$$O(\frac{n}{B})$$$ leaves, where each leaf represents a block of the sequence. On each leaf we store the sorted sequence, and on each internal node we store the sequence we get by merging $$$\frac{1}{3}$$$ of the elements from its left child and $$$\frac{1}{3}$$$ of the elements from its right child (as well as a subtree addition tag). This enables $$$O(\frac{n}{B} + \log n)$$$ query complexity in the same way as standard fractional cascading, while also allowing $$$O\left(\sum_{i=0}^{O(\log n)} \left(\frac{2}{3}\right)^i B \right) = O(B)$$$ update time.

Details can be found in IOI 2020 中国国家集训队论文集 (starting from page 4; Chinese version accessible at though I don't know any English translation of this). As indicated in this reference, in the offline setting we can also replace binary search by a huge radix sort for each block.

Note that I have never implemented this solution and I'm skeptical about its performance at $$$n = q = 10^5$$$ (it doesn't seem Cache-friendly, does it?).