### Errichto's blog

By Errichto, 3 years ago,

You can watch the lecture on Youtube: https://youtu.be/0r2D32esF3Y. I will do a second part soon. Some problems are quite vague, it's a nature of this topic.

1. Warm-up: You toss a coin till you get tails. How many tosses there will be, on average?
2. X or smaller — There is a hidden number $X$. An interactor repeatedly gives you a number, either $X$ or something smaller than $X$. All numbers are positive integers. When can you stop and say that you are (almost) certain what is the value of $X$?
3. Line through N/4 points — Given $N \leq 10^5$ points, find a line that passes through the maximum number of points. It's guaranteed that the answer is at least $N / 4$.
4. GCD (364D - Ghd) — given a set of $N \leq 10^6$ numbers, each up to $10^{12}$, find the maximum possible number that is a divisor of at least half of given numbers.
5. ACTG prefix — Guess a hidden string $S$ with characters A, C, T, G. You can choose some string and ask if it's a prefix of $S$. The length of $S$ is at most $10\,000$ and you can ask up to $25\,000$ queries.
6. How many tails you will get after tossing a fair coin $10^6$ times? It should be around $500\,000$, but how far away from this number can you realistically/plausibly get?
7. Don't use a fixed seed in Codeforces or Topcoder because somebody can hack/challenge you.
8. RAND_MAX in Codefoces is around $30\,000$, so use your own rand() in C++, same for random_shuffle(). More here: https://codeforces.com/blog/entry/61587

More (and harder) problems can be found here: https://codeforces.com/blog/entry/51244

Save trees: https://youtu.be/TnCVAEsYGKs #TeamTrees

EDIT, part 2: https://youtu.be/GS2MxmorEzc

1. Catch 'em all — When you encounter a Pokemon, it's random out of $N$ types. How many Pokemon will you encounter (on average) before seeing all $N$ types? Estimate this value.
2. Birthday paradox — How many encounters before we see the same type of Pokemon twice?
3. Hash collision — what is the probability of hash collision in problems like "given queries, check if this substring is equal to that substring" compared to "given a bunch of strings, find a pair of equal strings". The latter has much bigger probability of collision. Why?
4. How to randomly shuffle an array?
5. Repeated binary search — Find max element in a hidden sequence by asking queries "is $a_i$ greater than $x$?" faster than $\mathcal O(n \log n)$, https://codeforces.com/blog/entry/62602
6. Bonus: estimate $\pi$ using randomized algorithm.

• +290

 » 3 years ago, # |   0 3 and 4 can be solved deterministically using pigeonhole principle
•  » » 3 years ago, # ^ |   0 How? What complexity?
•  » » » 3 years ago, # ^ | ← Rev. 4 →   +10 Sorry I was wrong
•  » » » » 3 years ago, # ^ |   +4 It's $O(n^2)$ if you do that for every five consecutive points. And if you do it only for one five of points, you won't necessarily find the answer.
•  » » » » » 3 years ago, # ^ |   +27 Agree, I misunderstood
•  » » » 3 years ago, # ^ |   -7 for one second before reading further comments, I thought how is it possible that a expert knows something that an international grandmaster doesn't.
 » 3 years ago, # |   +70 There's also a nice classical problem which is given matrices $A, B, C$, check if $A * B == C$.
•  » » 3 years ago, # ^ | ← Rev. 2 →   +30 When I first encountered this problem (Matrix God) in this contest I was puzzled by the solution of the editorial. Then I got to know that it's a named algorithm (Freivalds algorithm) and is a part of broader class of techniques called fingerprinting (polynomial identity testing, string equality testing and pattern matching) . I read about this in Randomized Algorithms by Motwani, Raghavan (check out page 162). EDIT — Sorry I didn't realize that it was copyrighted. I just took the first link from the google search. I have removed the link now.
•  » » » 3 years ago, # ^ |   0
 » 3 years ago, # |   -15 The only two things God and Probability that I believe.
 » 3 years ago, # | ← Rev. 2 →   -9 given a tree. find minimum count of edges need to add to graph to contains no bridges. and need to output this edges. solutioncount is (leaves+1)/2 certificatedo{ connect leaves at random }while(graph contains bridges)
•  » » 3 years ago, # ^ |   +31 Ummm... are you sure about your certificate?
•  » » » 3 years ago, # ^ | ← Rev. 2 →   +9 probably...seriously, i dont have counterexamples for big number of iterations
•  » » » 3 years ago, # ^ |   +8 I think it should terminate in a constant number of iterations, at least intuitively. Each edge should have probability less than 1/(leaves) of being a bridge after the edges are connected (assume there are no degree two vertices)
•  » » 3 years ago, # ^ |   +13 test: n = 6, m = 5, edges = [(1, 2), (2, 3), (2, 4), (4, 5), (4, 6)]We can add [(1, 3), (5, 6)] and have bridge (2, 4)But we can add [(1, 5), (3, 6)] and have no bridgesObviously random doesn't work :(
•  » » » 3 years ago, # ^ |   -8 seems like you dont understand how construction do while works :(expected value of number of iterations on this (obvious) test is 1.5
•  » » » » 3 years ago, # ^ | ← Rev. 2 →   +17 Well, maybe you mean for (int i = 0; i < C; i++) { do{ connect leaves at random }while(graph contains bridges) ans = min(ans, added_edges); } I don't have another idea. Anyway there is a counterexample smth like this I think.[(1, 2), (2, 3), (1, 4...k), (3, k+1...2k)]By the way, this problem can be solved deterministically in O(|V| + |E|).UPD. Oh, my counterexample is bad :) Bad I'll try to find another, I can't believe this solution will work
•  » » » » » 3 years ago, # ^ |   +1 ans = min(ans, added_edges); is meaningless after i said count is (leaves+1)/2 (count is ans)ok, more code of what i mean: for(;;){ graph g = tree random_shuffle(ALL(leaves)) for(i=0; i
•  » » 3 years ago, # ^ |   +13 Deterministic solutionMark all leaves. If the number of leaves is odd, additionally mark an arbitrary unmarked vertex. Root the tree arbitrarily. Do a preorder traversal. Let $v_1, \dots, v_k$ denote the marked vertices in the order in which they appear in the traversal. For $i = 1, \dots, \frac{k}{2}$ connect $v_i$ with $v_{i + \frac{k}{2}}$. proofTo show that the resulting graph contains no bridges, take an arbitrary edge $e$ of the original tree. Removing $e$ splits the tree into two trees $T_1$ and $T_2$. W.l.o.g assume that $T_1$ contains at most $\frac{k}{2}$ marked vertices. Note that the vertices of $T_1$ appear contiguously in $v$ if we think of $v_k$ and $v_1$ as adjacent. As $T_1$ contains at most $\frac{k}{2}$ vertices, every edge we added to a vertex of $T_1$ connects it to a vertex of $T_2$. As $T_1$ contains at least one leaf (and hence at least one marked vertex), we added an edge between $T_1$ and $T_2$, so $e$ is not a bridge in the resulting graph.
 » 3 years ago, # |   +3 Thanks for the lecture — it's very comprehensive and accessible even for not-really-clever-people like me. :)Today I solved my first problem by randomization — Troublemakers. I couldn't believe my eyes when I got AC xDI'm very much looking forward to the second part of the lecture.
•  » » 3 years ago, # ^ |   0 This is actually very common that we do something randomly, show that the expected value is something, say $X$, and thus sometimes we must get value not worse than $X$. I think there are some well known $k$-approximations but I don't remember examples :D
•  » » » 3 years ago, # ^ |   +16 Well-known: given $3$-sat (each clause contains distinct variables), find in polynomial time variable assignment satisfying at least $\frac{7}{8}$ of clauses.
•  » » » » 3 years ago, # ^ |   +3 Just assigning each variable to true or false with probability 0.5 gives an expected $\frac{7}{8}-\text{approx}$ algorithm. Suppose the input 3-SAT formula has $n$ variables and $m$ clauses let's call the random variable $W$ as the number of clauses which are satisfied in a random assignment. And let $X_i$ denote the indicator RV such that it is 1 iff $i^{th}$ clause is true which happens with probability $1 - \frac{1}{8}$. Now we can see that $W = \sum_{i=1}^{m} X_i$. We are interested in finding $E[W] = E[\sum_{i=1}^{m} X_i] = \sum_{i=1}^{m} E[X_i]$ (due to linearity of expectation) Hence we get $E[W] = \frac{7}{8} \cdot m$ Let's call the maximum number of clauses which can be satisfied for the given instance as $OPT \leq m$ which gives us the following relation. $E[W] \geq \frac{7}{8} \cdot OPT$In fact this can be derandomized using the method of conditional expectations. Check out page 108 in Design Of Approximation Algorithms by Williamson and Shmoys
•  » » » » » 3 years ago, # ^ |   +3 Yes, you are correct. It's even possible to make estimation on the number of random iterations required.
•  » » » » » » 3 years ago, # ^ |   +3 You mean to run the randomized algorithm $t$ times and take the maximum value and try to bound the deviation of this modified algorithm from the mean ?
 » 2 years ago, # |   0 Can someone explain why the probability of success of the chosen point is 1/16 in the 3rd problem of first lecture
•  » » 2 years ago, # ^ |   0 At least $N/4$ points are on the optimal line, so probability of one point on the line is at least $N/4$ and probability of both points on the line is at least $N/16$.
 » 2 years ago, # |   0 Hey Errichto, Can you share your string algorithm youtube tutorials ?
 » 6 months ago, # |   0 Solution with proof for part 1, 5th problem.Let's find letters one by one. Notice that in worst case you should spend 3 guesses to determine a letter, if first three guesses are wrong then the remaining letter is correct. If you pick ACTG letters randomly, probability that you spend one query is 1/4, two queries 3/4 * 1/3 = 1/4, and three queries is remaining 1/2 probability. Thus expected number of queries for one letter is 2.25, with variance 11/16 = 0.6875. (Computed using well known formulas). Let q be a random variable — number of queries per determining one letter.S be sum of queries you spent — sum of all qi.n be 10000.Then (S — n * E[q]) / sqrt(n * var(q)) follows standard normal distribution (0 mean, 1 variance).Probability[ (S — n * E[q]) / sqrt(n * var(q)) < 30 ] is almost one.Where 30 * sqrt(n * var(q)) + n * E[q] = 24987 ~ 25000So probability that total number of queries S is less than 25000 if almost one.