### div4only's blog

By div4only, history, 7 weeks ago,

Given an $1$-indexed integer array $a=[a_1,\,a_2,\,a_3,...,\,a_n]$ and a fixed windows size $k$，define sliding window $I_{j,\,(j \geq k)} := [a_{j-k+1}, a_{j-k+2}, ..., a_j]$. For each $j \geq k$, we need to answer a query:

How many sliding windows in $\{I_k, I_{k+1}, ..., I_{j-1}\}$ are less or equal than $I_j$ in the alphabetical order?

For example, $a=[1, 2, 1, 3]$ and $k=2$:

(1) For $j=2$, we should answer $0$.

(2) For $j=3$, we should answer $1$ as $[1,2] < [2,1]$ in alphabetical order.

(3) For $j=4$, we should answer $1$ as $[1,2] < [1,3]$ in alphabetical order.

Suffix array + LCP array can solve it in $O(nlogn)$ offline. But how about solving online? For example, what if $a$ is an unbounded datastream instead of an array? In the datastream case, you have to process $a_j$ and $I_j$ before reading $a_{j+1}$.

• +16

 » 7 weeks ago, # |   +10 With hashing+order statistics you can solve it in $O(n\log(n)\log(k))$. The order statistics tree gives you the position in the sorted sequence of all $I_j$ in $O(n\log(n)\cdot\texttt{cmp})$, where $\texttt{cmp}$ is the time needed to compare two sequences. With hashing you can compare two sequences in $O(\log(k))$ by binary searching the longest common prefix and then comparing the next index.
•  » » 7 weeks ago, # ^ | ← Rev. 2 →   0 But here is a problem: The hash value of a window might be exponentially large, and you may need $O(k)$ time to maintain such a large number.
•  » » » 7 weeks ago, # ^ |   +6 Hash functions are surjective. There's no point in making it bijective for the reasons of performance. As a downside — collisions are possible. That's why you should minimize the chance of a collision (random moduli or base).
•  » » » » 7 weeks ago, # ^ |   0 But how can we compare windows after moduli?
•  » » » » » 7 weeks ago, # ^ |   +6
•  » » » » » » 7 weeks ago, # ^ | ← Rev. 2 →   0 Thanks!