pikel_rik's blog

By pikel_rik, history, 4 years ago, In English

Imagine you're given an array $$$A$$$ of length $$$n$$$ and you're required to find the largest subset of this array that satisfies a certain condition. Let's assume that the given condition makes the problem $NP-$complete and also allows the following to be true:

  1. If there exists a subset of length $$$k$$$ that satisfies the given condition, then assuming that $$$k > 1$$$, there necessarily exists subsets of length $$$l$$$, $$$1 \le l < k$$$, that satisfies the condition.

  2. If there does not exist a subset of length $$$k$$$ that satisfies the given condition, then assuming $$$k < n$$$, there necessarily will not exist any subset of length $$$l$$$, $$$k < l \le n$$$, that satisfies the condition.

In short, the binary search property holds.

Since the problem is $$$NP-$$$complete, however, there exists no efficient function that allows us to find whether there exists a subset of length $$$k$$$ that satisfies the condition or not. So the only thing we can do is iterate over all subsets of length $$$k$$$ and check whether each of them satisfy the given condition or not. The time complexity of this function would then be $$$O(n * \binom{n}{k})$$$.

My question is, if we use binary search on this problem armed with the brute force function mentioned above, what would the time complexity of this algorithm be in the worst case? I imagine there would be a significant speed-up since we would be evaluating the function only a logarithmic number of times, as opposed to ordinary brute force search where we would evaluate that function at all possible values of $$$k$$$ (by iterating through all the subsets of $$$A$$$), but not enough to give us a polynomial time algorithm.

For example, you can consider the set cover or vertex cover problems (in both of these problems we're required to find the smallest subset satisfying the given condition however, not the largest).

  • Vote: I like it
  • +9
  • Vote: I do not like it

»
4 years ago, # |
  Vote: I like it +14 Vote: I do not like it

For $$$k$$$ near $$$\frac{n}{2}$$$, $$$\binom{n}{k}$$$ is of order $$$\frac{2^n}{\sqrt{n}}$$$, and in the worst case around $$$\frac{1}{2} \log_2{n}$$$ of your queries will be near that magnitude, so the overall complexity of the algorithm you describe should be $$$O(2^n \sqrt{n} \log{n})$$$, generally speaking.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Oh, I see. Thank you! I didn't know how to work with binomial coefficients inside of the Big-Oh notation. I'm assuming you used Stirling's Approximation for that?

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it +5 Vote: I do not like it

      Stirling's approximation is one way to derive that estimate, but there are a few others. Although it's difficult to make a mathematically rigorous argument from it, one nice intuitive justification comes from the central limit theorem in statistics: $$$p(k) = 2^{-n} \binom {n}{k}$$$ is the probability mass function for a binomial distribution with mean $$$\mu = \frac{n}{2}$$$ and variance $$$\sigma^2 = \frac{n}{4}$$$ which is approximately Gaussian for large n. This suggests that it should reach a maximum of around $$$\frac{1}{\sqrt{2 \pi \sigma^2}} = \sqrt{\frac{2}{\pi n}}$$$ near $$$k = \frac{n}{2}$$$, and suggests that the width of the set of $$$k$$$ values that get close to this maximum should be of the same order as the standard deviation, which is $$$\sigma = \sqrt{\sigma^2} = \frac{1}{2} \sqrt{n}$$$.

      The mathematical difficulties with making this justification rigorous stem from the fact that the weak-* convergence given by the central limit theorem doesn't imply any pointwise properties and only really directly applies when taking limits inside of a sum or an integral. But (although I have made no effort to work through the details) I expect the log-concavity of the binomial coefficients provides enough leverage to make this approach tractable, and I know from other methods that the estimates the central limit theorem leads one to expect near $$$k = \frac{n}{2}$$$ are all correct.

      • »
        »
        »
        »
        4 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        I'm not very experienced in statistics, but your explanation did make a lot of sense intuitively. Thank you!