Errichto's blog

By Errichto, 4 years ago, In English

You can watch the lecture on Youtube: https://youtu.be/0r2D32esF3Y. I will do a second part soon. Some problems are quite vague, it's a nature of this topic.

  1. Warm-up: You toss a coin till you get tails. How many tosses there will be, on average?
  2. X or smaller — There is a hidden number $$$X$$$. An interactor repeatedly gives you a number, either $$$X$$$ or something smaller than $$$X$$$. All numbers are positive integers. When can you stop and say that you are (almost) certain what is the value of $$$X$$$?
  3. Line through N/4 points — Given $$$N \leq 10^5$$$ points, find a line that passes through the maximum number of points. It's guaranteed that the answer is at least $$$N / 4$$$.
  4. GCD (364D - Ghd) — given a set of $$$N \leq 10^6$$$ numbers, each up to $$$10^{12}$$$, find the maximum possible number that is a divisor of at least half of given numbers.
  5. ACTG prefix — Guess a hidden string $$$S$$$ with characters A, C, T, G. You can choose some string and ask if it's a prefix of $$$S$$$. The length of $$$S$$$ is at most $$$10\,000$$$ and you can ask up to $$$25\,000$$$ queries.
  6. How many tails you will get after tossing a fair coin $$$10^6$$$ times? It should be around $$$500\,000$$$, but how far away from this number can you realistically/plausibly get?
  7. Don't use a fixed seed in Codeforces or Topcoder because somebody can hack/challenge you.
  8. RAND_MAX in Codefoces is around $$$30\,000$$$, so use your own rand() in C++, same for random_shuffle(). More here: https://codeforces.com/blog/entry/61587

More (and harder) problems can be found here: https://codeforces.com/blog/entry/51244

And if you want to learn more about expected value: https://codeforces.com/blog/entry/62690

Save trees: https://youtu.be/TnCVAEsYGKs #TeamTrees

EDIT, part 2: https://youtu.be/GS2MxmorEzc

  1. Catch 'em all — When you encounter a Pokemon, it's random out of $$$N$$$ types. How many Pokemon will you encounter (on average) before seeing all $$$N$$$ types? Estimate this value.
  2. Birthday paradox — How many encounters before we see the same type of Pokemon twice?
  3. Hash collision — what is the probability of hash collision in problems like "given queries, check if this substring is equal to that substring" compared to "given a bunch of strings, find a pair of equal strings". The latter has much bigger probability of collision. Why?
  4. How to randomly shuffle an array?
  5. Repeated binary search — Find max element in a hidden sequence by asking queries "is $$$a_i$$$ greater than $$$x$$$?" faster than $$$\mathcal O(n \log n)$$$, https://codeforces.com/blog/entry/62602
  6. Bonus: estimate $$$\pi$$$ using randomized algorithm.
  • Vote: I like it
  • +290
  • Vote: I do not like it

| Write comment?
»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

3 and 4 can be solved deterministically using pigeonhole principle

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    How? What complexity?

    • »
      »
      »
      4 years ago, # ^ |
      Rev. 4   Vote: I like it +10 Vote: I do not like it

      Sorry I was wrong

      • »
        »
        »
        »
        4 years ago, # ^ |
          Vote: I like it +4 Vote: I do not like it

        It's $$$O(n^2)$$$ if you do that for every five consecutive points. And if you do it only for one five of points, you won't necessarily find the answer.

»
4 years ago, # |
  Vote: I like it +70 Vote: I do not like it

There's also a nice classical problem which is given matrices $$$A, B, C$$$, check if $$$A * B == C$$$.

  • »
    »
    4 years ago, # ^ |
    Rev. 2   Vote: I like it +30 Vote: I do not like it

    When I first encountered this problem (Matrix God) in this contest I was puzzled by the solution of the editorial. Then I got to know that it's a named algorithm (Freivalds algorithm) and is a part of broader class of techniques called fingerprinting (polynomial identity testing, string equality testing and pattern matching) . I read about this in Randomized Algorithms by Motwani, Raghavan (check out page 162).

    EDIT — Sorry I didn't realize that it was copyrighted. I just took the first link from the google search. I have removed the link now.

»
4 years ago, # |
Rev. 2   Vote: I like it -9 Vote: I do not like it

given a tree. find minimum count of edges need to add to graph to contains no bridges. and need to output this edges.

solution
  • »
    »
    4 years ago, # ^ |
      Vote: I like it +31 Vote: I do not like it

    Ummm... are you sure about your certificate?

    • »
      »
      »
      4 years ago, # ^ |
      Rev. 2   Vote: I like it +9 Vote: I do not like it

      probably...

      seriously, i dont have counterexamples for big number of iterations

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it +8 Vote: I do not like it

      I think it should terminate in a constant number of iterations, at least intuitively. Each edge should have probability less than 1/(leaves) of being a bridge after the edges are connected (assume there are no degree two vertices)

  • »
    »
    4 years ago, # ^ |
      Vote: I like it +13 Vote: I do not like it

    test: n = 6, m = 5, edges = [(1, 2), (2, 3), (2, 4), (4, 5), (4, 6)]

    We can add [(1, 3), (5, 6)] and have bridge (2, 4)

    But we can add [(1, 5), (3, 6)] and have no bridges

    Obviously random doesn't work :(

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it -8 Vote: I do not like it

      seems like you dont understand how construction do while works :(

      expected value of number of iterations on this (obvious) test is 1.5

      • »
        »
        »
        »
        4 years ago, # ^ |
        Rev. 2   Vote: I like it +17 Vote: I do not like it

        Well, maybe you mean

        for (int i = 0; i < C; i++) {
            do{ connect leaves at random }while(graph contains bridges)
            ans = min(ans, added_edges);
        }
        

        I don't have another idea. Anyway there is a counterexample smth like this I think.

        [(1, 2), (2, 3), (1, 4...k), (3, k+1...2k)]

        By the way, this problem can be solved deterministically in O(|V| + |E|).

        UPD. Oh, my counterexample is bad :) Bad I'll try to find another, I can't believe this solution will work

        • »
          »
          »
          »
          »
          4 years ago, # ^ |
            Vote: I like it +1 Vote: I do not like it

          ans = min(ans, added_edges); is meaningless after i said count is (leaves+1)/2 (count is ans)

          ok, more code of what i mean:

          for(;;){
              graph g = tree
              random_shuffle(ALL(leaves))
              for(i=0; i<size(leaves); i+=2) g.add_edge(leaves[i], leaves[i+1])
              if(g.count_of_bridges() == 0){
                  //print answer
                  break
              }
          }
          
  • »
    »
    4 years ago, # ^ |
      Vote: I like it +13 Vote: I do not like it
    Deterministic solution
»
4 years ago, # |
  Vote: I like it +3 Vote: I do not like it

Thanks for the lecture — it's very comprehensive and accessible even for not-really-clever-people like me. :)

Today I solved my first problem by randomization — Troublemakers. I couldn't believe my eyes when I got AC xD

I'm very much looking forward to the second part of the lecture.

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    This is actually very common that we do something randomly, show that the expected value is something, say $$$X$$$, and thus sometimes we must get value not worse than $$$X$$$. I think there are some well known $$$k$$$-approximations but I don't remember examples :D

    • »
      »
      »
      4 years ago, # ^ |
        Vote: I like it +16 Vote: I do not like it

      Well-known: given $$$3$$$-sat (each clause contains distinct variables), find in polynomial time variable assignment satisfying at least $$$\frac{7}{8}$$$ of clauses.

      • »
        »
        »
        »
        4 years ago, # ^ |
          Vote: I like it +3 Vote: I do not like it

        Just assigning each variable to true or false with probability 0.5 gives an expected $$$\frac{7}{8}-\text{approx}$$$ algorithm.

        Suppose the input 3-SAT formula has $$$n$$$ variables and $$$m$$$ clauses let's call the random variable $$$W$$$ as the number of clauses which are satisfied in a random assignment. And let $$$X_i$$$ denote the indicator RV such that it is 1 iff $$$i^{th}$$$ clause is true which happens with probability $$$1 - \frac{1}{8}$$$. Now we can see that $$$W = \sum_{i=1}^{m} X_i$$$. We are interested in finding $$$E[W] = E[\sum_{i=1}^{m} X_i] = \sum_{i=1}^{m} E[X_i]$$$ (due to linearity of expectation)

        Hence we get $$$E[W] = \frac{7}{8} \cdot m$$$

        Let's call the maximum number of clauses which can be satisfied for the given instance as $$$OPT \leq m$$$ which gives us the following relation. $$$E[W] \geq \frac{7}{8} \cdot OPT$$$

        In fact this can be derandomized using the method of conditional expectations. Check out page 108 in Design Of Approximation Algorithms by Williamson and Shmoys

        • »
          »
          »
          »
          »
          4 years ago, # ^ |
            Vote: I like it +3 Vote: I do not like it

          Yes, you are correct. It's even possible to make estimation on the number of random iterations required.

          • »
            »
            »
            »
            »
            »
            4 years ago, # ^ |
              Vote: I like it +3 Vote: I do not like it

            You mean to run the randomized algorithm $$$t$$$ times and take the maximum value and try to bound the deviation of this modified algorithm from the mean ?

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Can someone explain why the probability of success of the chosen point is 1/16 in the 3rd problem of first lecture

  • »
    »
    4 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    At least $$$N/4$$$ points are on the optimal line, so probability of one point on the line is at least $$$N/4$$$ and probability of both points on the line is at least $$$N/16$$$.

»
2 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Solution with proof for part 1, 5th problem.

Let's find letters one by one. Notice that in worst case you should spend 3 guesses to determine a letter, if first three guesses are wrong then the remaining letter is correct. If you pick ACTG letters randomly, probability that you spend one query is 1/4, two queries 3/4 * 1/3 = 1/4, and three queries is remaining 1/2 probability. Thus expected number of queries for one letter is 2.25, with variance 11/16 = 0.6875. (Computed using well known formulas).

Let q be a random variable — number of queries per determining one letter.

S be sum of queries you spent — sum of all qi.

n be 10000.

Then (S — n * E[q]) / sqrt(n * var(q)) follows standard normal distribution (0 mean, 1 variance).

Probability[ (S — n * E[q]) / sqrt(n * var(q)) < 30 ] is almost one.

Where 30 * sqrt(n * var(q)) + n * E[q] = 24987 ~ 25000

So probability that total number of queries S is less than 25000 if almost one.