Блог пользователя Laakeri

Автор Laakeri, история, 4 года назад, По-английски

The following question is frequently asked in Codeforces (46390, 45157, 11324): Is there a 2D segment tree that supports range addition and range minimum? In this blog post I give evidence that such a data structure does not exist, or if it did exist it would not generalize to higher dimensions. In particular I show that if for all $$$d$$$ a $$$d$$$-dimensional data structure that performs such queries in $$$O(polylog(N))$$$ time did exist, then the exponential time hypothesis would fail. Such a data structure exists for range addition and range sum, so this is a non-trivial claim separating the hardness of these problems.

Update 2021:

A paper " Algorithms and Hardness for Multidimensional Range Updates and Queries" in ITCS 2021 by Joshua Lau and Angus Ritossa (https://arxiv.org/abs/2101.02003, https://www.youtube.com/watch?v=joB6IaqNqQc) (I guess the authors' cf handles are junkbot and AngusRitossa) gives an overview of the complexity of multidimensional range queries with different query and update operations. For range addition and range minimum in 2D they show a lower bound of $$$\Omega(Q^{3/2-o(1)})$$$ under several different conjectures and give an algorithm with time complexity $$$O(Q^{3/2})$$$.

Update:

I researched some literature on related topics and found that while lower bounds for this exact data structure have not been given before, there has been a lot of research on related topics. Some observations:

  1. Klee's measure problem is very related to Range-Add-Min structure.
  2. The reduction from Clique to Klee's measure problem in [1] can be adapted to show that $$$d$$$-dimensional Range-Add-Min-structure with $$$O(polylog(N))$$$ queries could solve $$$d+1$$$-Clique in $$$O(n^2)$$$ time.
  3. Therefore $$$d$$$-dimensional Range-Add-Min-structure for all $$$d$$$ would mean FPT=W[1]. Note that ETH implies FPT!=W[1].
  4. Also, 2D Range-Add-Min structure could solve 3-clique in $$$O(n^2)$$$ time, which would be a breaktrough result.
  5. Hardness of very related problems has been considered earlier this year in [2]. I'm not sure if their results directly imply hardness of 2D Range-Add-Min structure.
  6. There is also a connection to MAX-2-CSP [3], which can be extended to connection to MAX-d-CSP for $$$d$$$-dimensional Range-Add-Min-structure.

[1] T. M. Chan. A (slightly) faster algorithm for klee's measure problem. SCG 2008 https://dl.acm.org/citation.cfm?id=1377693

[2] L. Duraj, K. Kleiner, A. Polak, V. V. Williams. Equivalences between triangle and range query problems. arXiv 2019 https://arxiv.org/abs/1908.11819

[3] R. Williams. A new algorithm for optimal 2-constraint satisfaction and its implications. TCS 2005 https://www.sciencedirect.com/science/article/pii/S0304397505005438

A pdf version of this blog is in http://laakeri.kapsi.fi/a/rmq.pdf

Formalization

Consider a $$$d$$$-dimensional integer array $$$A$$$ whose elements are indexed with $$$A[(i_1, \ldots, i_d)]$$$ for all $$$0 \le i_j < N$$$. Furthermore assume that $$$A$$$ is initialized to zero. The operation $$$add(v, l_1, r_1, \ldots, l_d, r_d)$$$ adds the value $$$v$$$ to all elements $$$A[(i_1, \ldots, i_d)]$$$ such that $$$l_j \le i_j < r_j$$$. The operation $$$min(l_1, r_1, \ldots, l_d, r_d)$$$ returns the minimum value over all elements $$$A[(i_1, \ldots, i_d)]$$$ such that $$$l_j \le i_j < r_j$$$. A data structure that supports these two operations is called Range-Add-Min-structure.

In the Boolean satisfiability problem (SAT) the task is to decide if there is an assigment of $$$n$$$ binary variables to true/false so that it satisfies a given Boolean formula. The formula consists of $$$m$$$ clauses, and an assigment of variables satisfies the formula if it satisfies every clause. A clause is a logical or that consists of variables and negations of variables. For example if $$$x_1, x_2, \ldots, x_n$$$ are variables, the clause $$$(x_1 \vee \neg x_3 \vee x_6)$$$ is satisfied if $$$x_1$$$ is true or $$$x_3$$$ is false or $$$x_6$$$ is true.

The problem $$$k$$$-SAT is Boolean satisfiability problem where each clause has at most $$$k$$$ variables or negations of variables. $$$2$$$-SAT can be solved in polynomial time, but $$$3$$$-SAT is NP-hard. Furthermore a widely believed exponential time hypothesis conjectures that there exists $$$\varepsilon > 0$$$ such that $$$3$$$-SAT cannot be solved in time $$$O((1+\varepsilon)^n poly(n, m))$$$, where $$$poly(n, m)$$$ is polynomial function on $$$n$$$ and $$$m$$$.

Theorem 1: If there exists a function $$$f(d)$$$ such that $$$d$$$-dimensional Range-Add-Min-structure can process $$$Q$$$ operations in $$$O((\log_2(N) + \log_2(Q))^{f(d)} Q)$$$ time for all $$$d$$$, then exponential time hypothesis fails.

Proof

We prove Theorem 1 by assuming that such $$$d$$$-dimensional data structure exists, and constructing a $$$O(2^{3n/d} poly(n,m))$$$ time algorithm for 3-SAT, where $$$n$$$ is the number of variables and $$$m$$$ is the number of clauses.

Claim 1: For all $$$k$$$ and $$$d$$$, $$$k$$$-SAT can be solved with $$$O(2^{kn/d} m)$$$ operations with a $$$d$$$-dimensional Range-Add-Min-structure with $$$N=2^{n/d}$$$.

Proof: Suppose without loss of generality that $$$d$$$ divides $$$n$$$. Let $$$N = 2^{n/d}$$$. We use the $$$d$$$-dimensional array $$$A$$$ with indices $$$(i_1, \ldots, i_d)$$$, $$$0 \le i_j < N$$$. $$$A$$$ has $$$2^n$$$ elements, because $$$N^d = (2^{n/d})^d = 2^n$$$.

Let $$$x_0, \ldots, x_{n-1}$$$ be a bit string of length $$$n$$$. The bijection $$$\phi$$$ with

$$$\phi(x_0, \ldots, x_{n-1}) = (1 x_0 + 2 x_1 + 4 x_2 + \ldots + x_{n/d-1} 2^{n/d - 1}, \ldots, 1 x_{n-n/d} + \ldots)$$$

maps bit strings of length $$$n$$$ to the indices of $$$A$$$. For example if $$$n=6$$$ and $$$d=2$$$, the bijection is $$$\phi(x_0, x_1, x_2, x_3, x_4, x_5) = (1 x_0 + 2 x_1 + 4 x_2, 1 x_3 + 2 x_4 + 4 x_5)$$$.

An assignment of variables in SAT is a bit string of length $$$n$$$. We use the array $$$A$$$ to represent if the assigment $$$x_0, x_1, \ldots, x_{n-1}$$$ is a satisfiable assignment of the SAT formula, using the bijection $$$\phi(x_0, \ldots, x_{n-1})$$$ to map between variable assignments and indices of $$$A$$$. We iteratively add clauses to the formula, maintaining the invariant that $$$A[\phi(x_0, \ldots, x_{n-1})] = 0$$$ if and only if the assigment $$$x_0, \ldots, x_{n-1}$$$ satisfies all clauses added so far.

Claim 2: Given a clause that contains $$$k$$$ variables or negations of variables, we can add $$$1$$$ to all elements $$$A[\phi(x_0, \ldots, x_{n-1})]$$$ such that $$$x_0, \ldots, x_{n-1}$$$ does not satisfy the clause using at most $$$2^{kn/d}$$$ $$$add$$$-operations.

Proof: Let $$$v_1, \ldots, v_k$$$ be the variables that occur in the clause (possibly with negations). Only the values $$$x_{v_1}, \ldots, x_{v_k}$$$ affect whether the clause is satisfied or not. These values affect at most $$$k$$$ dimensions in the mapping $$$\phi(x_0, \ldots, x_{n-1})$$$. Lets call these dimensions the non-trivial dimensions, and other dimensions the trivial dimensions. If we choose the values of non-trivial dimensions, then by the inverse mapping $$$\phi^{-1}$$$ we choose the values of $$$x_{v_1}, \ldots, x_{v_k}$$$, and know if the clause is satisfied or not. We brute force over all possible values of non-trivial dimensions. For each value, if the clause is not be satisfied by $$$\phi^{-1}$$$, we use operation $$$add(1, l_1, r_1, \ldots, l_d, r_d)$$$, where $$$l_j = r_j - 1$$$ is the brute forced value of dimension $$$j$$$ if $$$j$$$ is a non-trivial dimension and otherwise $$$l_j = 0$$$ and $$$r_j = N$$$. Each dimension has $$$N$$$ values, so brute forcing the non-trivial dimensions takes at most $$$N^k = (2^{n/d})^k = 2^{kn/d}$$$ $$$add$$$-operations. $$$\square$$$

By Claim 2 each clause can be processed with $$$2^{kn/d}$$$ $$$add$$$-operations. Therefore when adding all $$$m$$$ clauses we use at most $$$m 2^{kn/d}$$$ $$$add$$$-operations. After all clauses have been added, we use a single $$$min$$$-operation where $$$l_j = 0$$$ and $$$r_j = N$$$ for all dimensions $$$j$$$ to query if there are any elements in $$$A$$$ that have value $$$0$$$, and thus correspond to satisfiable assignments. $$$\square$$$

Claim 3: If the data structure of Theorem 1 exists, $$$k$$$-SAT can be solved in $$$O((1+\varepsilon)^n poly(n, m))$$$ for any $$$\varepsilon > 0$$$.

Proof: Given $$$k$$$ and $$$\varepsilon$$$, lets choose $$$d$$$ so that $$$2^{k/d} < 1 + \varepsilon$$$. By using the algorithm of Claim 1 with the data structure of Theorem 1 we obtain a runtime of

$$$O((\log_2(2^{n/d}) + \log_2(2^{kn/d} m))^{f(d)} 2^{kn/d} m)$$$
$$$= O((n/d + kn \log_2(m)/d)^{f(d)} 2^{kn/d} m)$$$
$$$= O((1 + \varepsilon)^n poly(n, m)).\square$$$

Claim 3 completes the proof, because the exponential time hypothesis states that there exists $$$\varepsilon > 0$$$ such that no $$$O((1 + \varepsilon)^n poly(n, m))$$$ algorithm exists for 3-SAT.

Conclusion

We proved that efficient $$$d$$$-dimensional Range-Add-Min-structure does not exist if the exponential time hypothesis holds. Our proof allows the existance of such 2D data structure. However we know that such 2D data structure must be developed with techniques that do not generalize to higher dimensions. An 8-dimensional Range-Add-Min-structure would imply a state-of-the-art algorithm for 3-SAT trough our reduction. Some open questions are:

  1. Is there a conditional hardness proof for 2D case?

  2. Likewise to $$$(min, +)$$$-semiring, our reduction also works for $$$(+, \cdot)$$$-semiring. Does it work for all commutative semirings?

  3. A $$$d$$$-dimensional data structure with $$$O(polylog(N))$$$ operations exists if the operations are addition and sum. Can we classify all pairs of operations so that the data structure either exists, or does not exist assuming ETH?

  4. Is it plausible that techniques for designing multidimensional data structures work for $$$d=2$$$ but not for some higher $$$d$$$?

  • Проголосовать: нравится
  • +397
  • Проголосовать: не нравится

»
4 года назад, # |
  Проголосовать: нравится +35 Проголосовать: не нравится

Can you write a pre-requisite concepts involved section? It was quite difficult to follow.

»
4 года назад, # |
Rev. 3   Проголосовать: нравится +28 Проголосовать: не нравится

Nice result! But right now it works only for data structures that need exactly polylogarithmic time per query (so no amortization) and do not require any initialization. Usually, initializing a data structure of size $$$\ell$$$ takes $$$O(\ell)$$$ time and here it would be $$$O(2^n)$$$ time already.

To more or less rule out the possibility of some kind of amortization in the data structure, one can carry out the same process for $$$2^{n - (kn/d)}$$$ instances of $$$k$$$-SAT, zeroing out the array between them by applying the subtraction operation on the subarrays that were increased. This way, having an ability to answer $$$q$$$ queries over an $$$d$$$-dimensional array with $$$\ell$$$ elements in $$$O((q+\ell) \; \mathrm{polylog} (q+\ell))$$$ total time would make it possible to solve $$$2^{n - (kn/d)}$$$ instances of $$$k$$$-SAT in $$$O(2^n \; \mathrm{poly}(n, m))$$$ total time.

That, in respect, provides a way to solve a $$$k$$$-SAT instance with $$$n + (n - (kn/d))$$$ variables in $$$O(2^n \;\mathrm{poly}(n, m))$$$ time: simply enumerate the values of first $$$n-(kn/d)$$$ variables and then solve the obtained $$$2^{n-kn/d}$$$ $$$k$$$-SAT instances. Therefore, for any integer $$$k$$$ and any positive $$$\varepsilon > 0$$$, we would be able to solve $$$k$$$-SAT over $$$\mathrm{vars}$$$ variables in $$$O(2^{(1 + \varepsilon)\mathrm{vars}/2} \; \mathrm{poly}(\mathrm{vars}, \mathrm{clauses}))$$$ time by choosing appropriate large $$$d$$$ and $$$n$$$, so $$$n + (n - (kn/d)) \approx \mathrm{vars}$$$.

Not exactly ETH-contradicting, but still kind of unlikely.

  • »
    »
    4 года назад, # ^ |
      Проголосовать: нравится +24 Проголосовать: не нравится

    Thanks! I'm not sure if I get how both amortization and initialization are issues. I think that the reduction works even if all queries can be answered offline, which is as much amortization as one can think?

    However I get that initialization is an issue, and your argument shows that even with full initialization of the array, we could obtain a $$$O(2^{n/2 + \varepsilon} poly(n,m))$$$ algorithm for $$$k$$$-SAT for any $$$k$$$. While this would not contradict ETH, it would contradict the strong exponential time hypothesis, which is also a very respected hypothesis in complexity theory and has been known for almost 20 years. Very nice observation!

»
4 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by Laakeri (previous revision, new revision, compare).

»
3 года назад, # |
  Проголосовать: нравится +27 Проголосовать: не нравится

Auto comment: topic has been updated by Laakeri (previous revision, new revision, compare).