Ok, here's the formal statement:
We have some array of integers $$$a_1, a_2, \dots, a_n$$$ with $$$n$$$ large (around $$$10^5$$$). We have to support the following types of operations:
Update: Select some subarray $$$[l, r]$$$ of $$$a$$$ and an arbitrary integer $$$x$$$, and add $$$x$$$ to each of $$$a_i$$$ for $$$l \leq i \leq r$$$. That is, we must be able to perform range addition. Note that $$$x$$$ can be both positive or negative.
Query: Count the number of integers at most $$$x$$$ in some subarray $$$[l, r]$$$ of $$$a$$$. Note that $$$x$$$ is allowed to change across queries.
Obviously, the interest is in subquadratic solutions. Fast online solutions (or offline) are interesting to me, and I'd be happy to know about them.
Thank you, NovusStellachan
I think you can solve it using segment tree with lazy propogation, when we do selection for the olympiad this type of provlem came to us.
Sadly, even if we restrict the query to the subarray $$$[1, n]$$$ and $$$x$$$ to a particular value, say $$$x=0$$$, we believe there is no faster online solution than $$$\tilde{O}(\sqrt{n})$$$ per operation (even amortized). A reduction using Online Matrix Vector Multiplication (OMv) hypothesis can be shown.
Of course, you can easily achieve something like $$$\tilde{O}(\sqrt{n\log{n}})$$$ using sqrt decomposition.
I'm not sure about this at the moment, but probably same reduction can be done to show that you can multiply matrices in $$$\tilde{O}(n^2)$$$, if you can solve this problem (even offline) in polylogarithmic time per operation.
I'm only aware of $$$O(n^{1.5}\sqrt{\log{n}})$$$ solution using sqrt decomposition. Split array into $$$k$$$ blocks of size $$$\frac{n} {k} $$$, build over each one smth like explicit treap (i. e. it will store block in sorted order).
To perform range addition $$$(l, r, x)$$$, we can rebuild treaps of first and last blocks in $$$O(\frac{n} {k})$$$, and for treaps of middle blocks we just remember that all values in it are increased by $$$x$$$. Total complexity $$$O(k + \frac{n} {k})$$$.
For the second query we perform lower_bound queries on treaps in the middle in $$$O(\frac{n} {k}\log{k} )$$$, and do straightforward checking of numbers in the endpoints in $$$O(k)$$$.
By choosing $$$k = \sqrt{n \log n} $$$ we get the aforementioned completixy.
It can be done even easier, instead of treaps let's just store a sorted list of values. You can rebuild first and last block by merging two sorted lists, one with changed and the other with not changed values.
You do not need a treap, you can simply use ordered sets. For every block $$$B$$$, maintain an ordered set and also an integer variable $$$z$$$ (initially set to $$$0$$$).
For update $$$[l, r, x]$$$:
For queries: Within an ordered set, if the element $$$e$$$ exists, it just means that $$$z + e$$$ is actually present.
SQRTdecomposition, divide array into consecutive blocks of $$$B$$$. For each block store whole block increment and sorted version of the block.
If update does not cut the block, just modify block increment. If update cuts the block it must be updated by naively changing affected elements, but instead of copying and sorting to update sorted version use merge on two parts of the block (increasing part of the block will break it in into two sorted parts). This will give $$$O(B + \frac{N}{B})$$$ per update.
To count elements in the whole block use
upper_bound(x  block_increment)
, leftovers can be calculated naively. This will give $$$O(B + \frac{N}{B} \log B)$$$ per count.With $$$B = \sqrt{N \log B}$$$ we can get $$$O(\sqrt{N \log B})$$$ per any query.
Also naive solution to this problem is very cachefriendly and vectorizationfriendly. I think you can get $$$O(N / w)$$$ per query without much effort and that will be very practical solution for small $$$N$$$.

An $$$O(q \sqrt{n \log n})$$$ solution is straightforward: suppose we divide the sequence into blocks of size $$$B$$$; for each block, we maintain (1) an addition tag and (2) the relative order of the elements inside this block. Then, we can achieve time complexity $$$O(B + \frac{n}{B})$$$ per update (by using merge instead of sort) and $$$O(B + \frac{n \log n}{B}$$$ per query.
To remove the $$$\log$$$ factor, notice that the only bottleneck in the algorithm above is to perform binary search simultaneously on each of the $$$\frac{n}{B}$$$ blocks. This can be solved (online) using a variation of the fractional cascading technique.
Consider a "segment tree" with $$$O(\frac{n}{B})$$$ leaves, where each leaf represents a block of the sequence. On each leaf we store the sorted sequence, and on each internal node we store the sequence we get by merging $$$\frac{1}{3}$$$ of the elements from its left child and $$$\frac{1}{3}$$$ of the elements from its right child (as well as a subtree addition tag). This enables $$$O(\frac{n}{B} + \log n)$$$ query complexity in the same way as standard fractional cascading, while also allowing $$$O\left(\sum_{i=0}^{O(\log n)} \left(\frac{2}{3}\right)^i B \right) = O(B)$$$ update time.
Details can be found in IOI 2020 中国国家集训队论文集 (starting from page 4; Chinese version accessible at https://github.com/enkerewpo/OIPublicLibrary/tree/master though I don't know any English translation of this). As indicated in this reference, in the offline setting we can also replace binary search by a huge radix sort for each block.
Note that I have never implemented this solution and I'm skeptical about its performance at $$$n = q = 10^5$$$ (it doesn't seem Cachefriendly, does it?).