Laakeri's blog

By Laakeri, history, 6 weeks ago, In English,

The following question is frequently asked in Codeforces (46390, 45157, 11324): Is there a 2D segment tree that supports range addition and range minimum? In this blog post I give evidence that such a data structure does not exist, or if it did exist it would not generalize to higher dimensions. In particular I show that if for all $$$d$$$ a $$$d$$$-dimensional data structure that performs such queries in $$$O(polylog(N))$$$ time did exist, then the exponential time hypothesis would fail. Such a data structure exists for range addition and range sum, so this is a non-trivial claim separating the hardness of these problems.

Update:

I researched some literature on related topics and found that while lower bounds for this exact data structure have not been given before, there has been a lot of research on related topics. Some observations:

  1. Klee's measure problem is very related to Range-Add-Min structure.
  2. The reduction from Clique to Klee's measure problem in [1] can be adapted to show that $$$d$$$-dimensional Range-Add-Min-structure with $$$O(polylog(N))$$$ queries could solve $$$d+1$$$-Clique in $$$O(n^2)$$$ time.
  3. Therefore $$$d$$$-dimensional Range-Add-Min-structure for all $$$d$$$ would mean FPT=W[1]. Note that ETH implies FPT!=W[1].
  4. Also, 2D Range-Add-Min structure could solve 3-clique in $$$O(n^2)$$$ time, which would be a breaktrough result.
  5. Hardness of very related problems has been considered earlier this year in [2]. I'm not sure if their results directly imply hardness of 2D Range-Add-Min structure.
  6. There is also a connection to MAX-2-CSP [3], which can be extended to connection to MAX-d-CSP for $$$d$$$-dimensional Range-Add-Min-structure.

[1] T. M. Chan. A (slightly) faster algorithm for klee's measure problem. SCG 2008 https://dl.acm.org/citation.cfm?id=1377693

[2] L. Duraj, K. Kleiner, A. Polak, V. V. Williams. Equivalences between triangle and range query problems. arXiv 2019 https://arxiv.org/abs/1908.11819

[3] R. Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. TCS 2005 https://www.sciencedirect.com/science/article/pii/S0304397505005438

A pdf version of this blog is in http://laakeri.kapsi.fi/a/rmq.pdf

Formalization

Consider a $$$d$$$-dimensional integer array $$$A$$$ whose elements are indexed with $$$A[(i_1, \ldots, i_d)]$$$ for all $$$0 \le i_j < N$$$. Furthermore assume that $$$A$$$ is initialized to zero. The operation $$$add(v, l_1, r_1, \ldots, l_d, r_d)$$$ adds the value $$$v$$$ to all elements $$$A[(i_1, \ldots, i_d)]$$$ such that $$$l_j \le i_j < r_j$$$. The operation $$$min(l_1, r_1, \ldots, l_d, r_d)$$$ returns the minimum value over all elements $$$A[(i_1, \ldots, i_d)]$$$ such that $$$l_j \le i_j < r_j$$$. A data structure that supports these two operations is called Range-Add-Min-structure.

In the Boolean satisfiability problem (SAT) the task is to decide if there is an assigment of $$$n$$$ binary variables to true/false so that it satisfies a given Boolean formula. The formula consists of $$$m$$$ clauses, and an assigment of variables satisfies the formula if it satisfies every clause. A clause is a logical or that consists of variables and negations of variables. For example if $$$x_1, x_2, \ldots, x_n$$$ are variables, the clause $$$(x_1 \vee \neg x_3 \vee x_6)$$$ is satisfied if $$$x_1$$$ is true or $$$x_3$$$ is false or $$$x_6$$$ is true.

The problem $$$k$$$-SAT is Boolean satisfiability problem where each clause has at most $$$k$$$ variables or negations of variables. $$$2$$$-SAT can be solved in polynomial time, but $$$3$$$-SAT is NP-hard. Furthermore a widely believed exponential time hypothesis conjectures that there exists $$$\varepsilon > 0$$$ such that $$$3$$$-SAT cannot be solved in time $$$O((1+\varepsilon)^n poly(n, m))$$$, where $$$poly(n, m)$$$ is polynomial function on $$$n$$$ and $$$m$$$.

Theorem 1: If there exists a function $$$f(d)$$$ such that $$$d$$$-dimensional Range-Add-Min-structure can process $$$Q$$$ operations in $$$O((\log_2(N) + \log_2(Q))^{f(d)} Q)$$$ time for all $$$d$$$, then exponential time hypothesis fails.

Proof

We prove Theorem 1 by assuming that such $$$d$$$-dimensional data structure exists, and constructing a $$$O(2^{3n/d} poly(n,m))$$$ time algorithm for 3-SAT, where $$$n$$$ is the number of variables and $$$m$$$ is the number of clauses.

Claim 1: For all $$$k$$$ and $$$d$$$, $$$k$$$-SAT can be solved with $$$O(2^{kn/d} m)$$$ operations with a $$$d$$$-dimensional Range-Add-Min-structure with $$$N=2^{n/d}$$$.

Proof: Suppose without loss of generality that $$$d$$$ divides $$$n$$$. Let $$$N = 2^{n/d}$$$. We use the $$$d$$$-dimensional array $$$A$$$ with indices $$$(i_1, \ldots, i_d)$$$, $$$0 \le i_j < N$$$. $$$A$$$ has $$$2^n$$$ elements, because $$$N^d = (2^{n/d})^d = 2^n$$$.

Let $$$x_0, \ldots, x_{n-1}$$$ be a bit string of length $$$n$$$. The bijection $$$\phi$$$ with

$$$\phi(x_0, \ldots, x_{n-1}) = (1 x_0 + 2 x_1 + 4 x_2 + \ldots + x_{n/d-1} 2^{n/d - 1}, \ldots, 1 x_{n-n/d} + \ldots)$$$

maps bit strings of length $$$n$$$ to the indices of $$$A$$$. For example if $$$n=6$$$ and $$$d=2$$$, the bijection is $$$\phi(x_0, x_1, x_2, x_3, x_4, x_5) = (1 x_0 + 2 x_1 + 4 x_2, 1 x_3 + 2 x_4 + 4 x_5)$$$.

An assignment of variables in SAT is a bit string of length $$$n$$$. We use the array $$$A$$$ to represent if the assigment $$$x_0, x_1, \ldots, x_{n-1}$$$ is a satisfiable assignment of the SAT formula, using the bijection $$$\phi(x_0, \ldots, x_{n-1})$$$ to map between variable assignments and indices of $$$A$$$. We iteratively add clauses to the formula, maintaining the invariant that $$$A[\phi(x_0, \ldots, x_{n-1})] = 0$$$ if and only if the assigment $$$x_0, \ldots, x_{n-1}$$$ satisfies all clauses added so far.

Claim 2: Given a clause that contains $$$k$$$ variables or negations of variables, we can add $$$1$$$ to all elements $$$A[\phi(x_0, \ldots, x_{n-1})]$$$ such that $$$x_0, \ldots, x_{n-1}$$$ does not satisfy the clause using at most $$$2^{kn/d}$$$ $$$add$$$-operations.

Proof: Let $$$v_1, \ldots, v_k$$$ be the variables that occur in the clause (possibly with negations). Only the values $$$x_{v_1}, \ldots, x_{v_k}$$$ affect whether the clause is satisfied or not. These values affect at most $$$k$$$ dimensions in the mapping $$$\phi(x_0, \ldots, x_{n-1})$$$. Lets call these dimensions the non-trivial dimensions, and other dimensions the trivial dimensions. If we choose the values of non-trivial dimensions, then by the inverse mapping $$$\phi^{-1}$$$ we choose the values of $$$x_{v_1}, \ldots, x_{v_k}$$$, and know if the clause is satisfied or not. We brute force over all possible values of non-trivial dimensions. For each value, if the clause is not be satisfied by $$$\phi^{-1}$$$, we use operation $$$add(1, l_1, r_1, \ldots, l_d, r_d)$$$, where $$$l_j = r_j - 1$$$ is the brute forced value of dimension $$$j$$$ if $$$j$$$ is a non-trivial dimension and otherwise $$$l_j = 0$$$ and $$$r_j = N$$$. Each dimension has $$$N$$$ values, so brute forcing the non-trivial dimensions takes at most $$$N^k = (2^{n/d})^k = 2^{kn/d}$$$ $$$add$$$-operations. $$$\square$$$

By Claim 2 each clause can be processed with $$$2^{kn/d}$$$ $$$add$$$-operations. Therefore when adding all $$$m$$$ clauses we use at most $$$m 2^{kn/d}$$$ $$$add$$$-operations. After all clauses have been added, we use a single $$$min$$$-operation where $$$l_j = 0$$$ and $$$r_j = N$$$ for all dimensions $$$j$$$ to query if there are any elements in $$$A$$$ that have value $$$0$$$, and thus correspond to satisfiable assignments. $$$\square$$$

Claim 3: If the data structure of Theorem 1 exists, $$$k$$$-SAT can be solved in $$$O((1+\varepsilon)^n poly(n, m))$$$ for any $$$\varepsilon > 0$$$.

Proof: Given $$$k$$$ and $$$\varepsilon$$$, lets choose $$$d$$$ so that $$$2^{k/d} < 1 + \varepsilon$$$. By using the algorithm of Claim 1 with the data structure of Theorem 1 we obtain a runtime of

$$$O((\log_2(2^{n/d}) + \log_2(2^{kn/d} m))^{f(d)} 2^{kn/d} m)$$$
$$$= O((n/d + kn \log_2(m)/d)^{f(d)} 2^{kn/d} m)$$$
$$$= O((1 + \varepsilon)^n poly(n, m)).\square$$$

Claim 3 completes the proof, because the exponential time hypothesis states that there exists $$$\varepsilon > 0$$$ such that no $$$O((1 + \varepsilon)^n poly(n, m))$$$ algorithm exists for 3-SAT.

Conclusion

We proved that efficient $$$d$$$-dimensional Range-Add-Min-structure does not exist if the exponential time hypothesis holds. Our proof allows the existance of such 2D data structure. However we know that such 2D data structure must be developed with techniques that do not generalize to higher dimensions. An 8-dimensional Range-Add-Min-structure would imply a state-of-the-art algorithm for 3-SAT trough our reduction. Some open questions are:

  1. Is there a conditional hardness proof for 2D case?

  2. Likewise to $$$(min, +)$$$-semiring, our reduction also works for $$$(+, \cdot)$$$-semiring. Does it work for all commutative semirings?

  3. A $$$d$$$-dimensional data structure with $$$O(polylog(N))$$$ operations exists if the operations are addition and sum. Can we classify all pairs of operations so that the data structure either exists, or does not exist assuming ETH?

  4. Is it plausible that techniques for designing multidimensional data structures work for $$$d=2$$$ but not for some higher $$$d$$$?

 
 
 
 
  • Vote: I like it
  • +397
  • Vote: I do not like it

»
6 weeks ago, # |
  Vote: I like it +35 Vote: I do not like it

Can you write a pre-requisite concepts involved section? It was quite difficult to follow.

  • »
    »
    6 weeks ago, # ^ |
      Vote: I like it +92 Vote: I do not like it

    It was quite difficult to follow.

    In theoretical CS, that's not a bug, it's feature.

  • »
    »
    6 weeks ago, # ^ |
      Vote: I like it +47 Vote: I do not like it

    Thanks for feedback! The main pre-requisite concept for this blog is the Boolean satisfiability problem and intuitive understanding of it. I'll try to improve the blog later this week based on feedback.

»
6 weeks ago, # |
Rev. 3   Vote: I like it +28 Vote: I do not like it

Nice result! But right now it works only for data structures that need exactly polylogarithmic time per query (so no amortization) and do not require any initialization. Usually, initializing a data structure of size $$$\ell$$$ takes $$$O(\ell)$$$ time and here it would be $$$O(2^n)$$$ time already.

To more or less rule out the possibility of some kind of amortization in the data structure, one can carry out the same process for $$$2^{n - (kn/d)}$$$ instances of $$$k$$$-SAT, zeroing out the array between them by applying the subtraction operation on the subarrays that were increased. This way, having an ability to answer $$$q$$$ queries over an $$$d$$$-dimensional array with $$$\ell$$$ elements in $$$O((q+\ell) \; \mathrm{polylog} (q+\ell))$$$ total time would make it possible to solve $$$2^{n - (kn/d)}$$$ instances of $$$k$$$-SAT in $$$O(2^n \; \mathrm{poly}(n, m))$$$ total time.

That, in respect, provides a way to solve a $$$k$$$-SAT instance with $$$n + (n - (kn/d))$$$ variables in $$$O(2^n \;\mathrm{poly}(n, m))$$$ time: simply enumerate the values of first $$$n-(kn/d)$$$ variables and then solve the obtained $$$2^{n-kn/d}$$$ $$$k$$$-SAT instances. Therefore, for any integer $$$k$$$ and any positive $$$\varepsilon > 0$$$, we would be able to solve $$$k$$$-SAT over $$$\mathrm{vars}$$$ variables in $$$O(2^{(1 + \varepsilon)\mathrm{vars}/2} \; \mathrm{poly}(\mathrm{vars}, \mathrm{clauses}))$$$ time by choosing appropriate large $$$d$$$ and $$$n$$$, so $$$n + (n - (kn/d)) \approx \mathrm{vars}$$$.

Not exactly ETH-contradicting, but still kind of unlikely.

  • »
    »
    6 weeks ago, # ^ |
      Vote: I like it +24 Vote: I do not like it

    Thanks! I'm not sure if I get how both amortization and initialization are issues. I think that the reduction works even if all queries can be answered offline, which is as much amortization as one can think?

    However I get that initialization is an issue, and your argument shows that even with full initialization of the array, we could obtain a $$$O(2^{n/2 + \varepsilon} poly(n,m))$$$ algorithm for $$$k$$$-SAT for any $$$k$$$. While this would not contradict ETH, it would contradict the strong exponential time hypothesis, which is also a very respected hypothesis in complexity theory and has been known for almost 20 years. Very nice observation!

»
3 weeks ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Laakeri (previous revision, new revision, compare).