Randomized algorithms lecture, part 1 & 2

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	orzdevinwang	3570
4	Geothermal	3569
4	cnnfls_csy	3569
6	tourist	3565
7	maroonrk	3531
8	Radewoosh	3521
9	Um_nik	3482
10	jiangly	3468

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	151
8	SecondThread	147
9	orz	146
10	pajenegod	145

You can watch the lecture on Youtube: https://youtu.be/0r2D32esF3Y. I will do a second part soon. Some problems are quite vague, it's a nature of this topic.

Warm-up: You toss a coin till you get tails. How many tosses there will be, on average?
X or smaller — There is a hidden number $$$X$$$. An interactor repeatedly gives you a number, either $$$X$$$ or something smaller than $$$X$$$. All numbers are positive integers. When can you stop and say that you are (almost) certain what is the value of $$$X$$$?
Line through N/4 points — Given $$$N \leq 10^5$$$ points, find a line that passes through the maximum number of points. It's guaranteed that the answer is at least $$$N / 4$$$.
GCD (364D - Ghd) — given a set of $$$N \leq 10^6$$$ numbers, each up to $$$10^{12}$$$, find the maximum possible number that is a divisor of at least half of given numbers.
ACTG prefix — Guess a hidden string $$$S$$$ with characters A, C, T, G. You can choose some string and ask if it's a prefix of $$$S$$$. The length of $$$S$$$ is at most $$$10\,000$$$ and you can ask up to $$$25\,000$$$ queries.
How many tails you will get after tossing a fair coin $$$10^6$$$ times? It should be around $$$500\,000$$$, but how far away from this number can you realistically/plausibly get?
Don't use a fixed seed in Codeforces or Topcoder because somebody can hack/challenge you.
RAND_MAX in Codefoces is around $$$30\,000$$$, so use your own rand() in C++, same for random_shuffle(). More here: https://codeforces.com/blog/entry/61587

More (and harder) problems can be found here: https://codeforces.com/blog/entry/51244

And if you want to learn more about expected value: https://codeforces.com/blog/entry/62690

Save trees: https://youtu.be/TnCVAEsYGKs #TeamTrees

EDIT, part 2: https://youtu.be/GS2MxmorEzc

Catch 'em all — When you encounter a Pokemon, it's random out of $$$N$$$ types. How many Pokemon will you encounter (on average) before seeing all $$$N$$$ types? Estimate this value.
Birthday paradox — How many encounters before we see the same type of Pokemon twice?
Hash collision — what is the probability of hash collision in problems like "given queries, check if this substring is equal to that substring" compared to "given a bunch of strings, find a pair of equal strings". The latter has much bigger probability of collision. Why?
How to randomly shuffle an array?
Repeated binary search — Find max element in a hidden sequence by asking queries "is $$$a_i$$$ greater than $$$x$$$?" faster than $$$\mathcal O(n \log n)$$$, https://codeforces.com/blog/entry/62602
Bonus: estimate $$$\pi$$$ using randomized algorithm.

for(;;){ graph g = tree random_shuffle(ALL(leaves)) for(i=0; i<size(leaves); i+=2) g.add_edge(leaves[i], leaves[i+1]) if(g.count_of_bridges() == 0){ //print answer break } }

Comments (26)

Show archived | Write comment?

adhoc_king

4 years ago, # |

3 and 4 can be solved deterministically using pigeonhole principle

→ Reply

Errichto

4 years ago, # ^ |

How? What complexity?

← Rev. 4 →

+10

Sorry I was wrong

It's $$$O(n^2)$$$ if you do that for every five consecutive points. And if you do it only for one five of points, you won't necessarily find the answer.

+27

Agree, I misunderstood

Mohammad_Yasser

+70

There's also a nice classical problem which is given matrices $$$A, B, C$$$, check if $$$A * B == C$$$.

bhishma

← Rev. 2 →

+30

When I first encountered this problem (Matrix God) in this contest I was puzzled by the solution of the editorial. Then I got to know that it's a named algorithm (Freivalds algorithm) and is a part of broader class of techniques called fingerprinting (polynomial identity testing, string equality testing and pattern matching) . I read about this in Randomized Algorithms by Motwani, Raghavan (check out page 162).

EDIT — Sorry I didn't realize that it was copyrighted. I just took the first link from the google search. I have removed the link now.

.______.

oversolver

-9

given a tree. find minimum count of edges need to add to graph to contains no bridges. and need to output this edges.

solution

300iq

+31

Ummm... are you sure about your certificate?

probably...

seriously, i dont have counterexamples for big number of iterations

ksun48

I think it should terminate in a constant number of iterations, at least intuitively. Each edge should have probability less than 1/(leaves) of being a bridge after the edges are connected (assume there are no degree two vertices)

MakArtKar

+13

test: n = 6, m = 5, edges = [(1, 2), (2, 3), (2, 4), (4, 5), (4, 6)]

We can add [(1, 3), (5, 6)] and have bridge (2, 4)

But we can add [(1, 5), (3, 6)] and have no bridges

Obviously random doesn't work :(

-8

seems like you dont understand how construction do while works :(

expected value of number of iterations on this (obvious) test is 1.5

+17

Well, maybe you mean

for (int i = 0; i < C; i++) {
    do{ connect leaves at random }while(graph contains bridges)
    ans = min(ans, added_edges);
}

I don't have another idea. Anyway there is a counterexample smth like this I think.

[(1, 2), (2, 3), (1, 4...k), (3, k+1...2k)]

By the way, this problem can be solved deterministically in O(|V| + |E|).

UPD. Oh, my counterexample is bad :) Bad I'll try to find another, I can't believe this solution will work

ans = min(ans, added_edges); is meaningless after i said count is (leaves+1)/2 (count is ans)

ok, more code of what i mean:

dacin21

Deterministic solution

kocko

Thanks for the lecture — it's very comprehensive and accessible even for not-really-clever-people like me. :)

Today I solved my first problem by randomization — Troublemakers. I couldn't believe my eyes when I got AC xD

I'm very much looking forward to the second part of the lecture.

This is actually very common that we do something randomly, show that the expected value is something, say $$$X$$$, and thus sometimes we must get value not worse than $$$X$$$. I think there are some well known $$$k$$$-approximations but I don't remember examples :D

cdkrot

+16

Well-known: given $$$3$$$-sat (each clause contains distinct variables), find in polynomial time variable assignment satisfying at least $$$\frac{7}{8}$$$ of clauses.

Just assigning each variable to true or false with probability 0.5 gives an expected $$$\frac{7}{8}-\text{approx}$$$ algorithm.

Suppose the input 3-SAT formula has $$$n$$$ variables and $$$m$$$ clauses let's call the random variable $$$W$$$ as the number of clauses which are satisfied in a random assignment. And let $$$X_i$$$ denote the indicator RV such that it is 1 iff $$$i^{th}$$$ clause is true which happens with probability $$$1 - \frac{1}{8}$$$. Now we can see that $$$W = \sum_{i=1}^{m} X_i$$$. We are interested in finding $$$E[W] = E[\sum_{i=1}^{m} X_i] = \sum_{i=1}^{m} E[X_i]$$$ (due to linearity of expectation)

Hence we get $$$E[W] = \frac{7}{8} \cdot m$$$

Let's call the maximum number of clauses which can be satisfied for the given instance as $$$OPT \leq m$$$ which gives us the following relation. $$$E[W] \geq \frac{7}{8} \cdot OPT$$$

In fact this can be derandomized using the method of conditional expectations. Check out page 108 in Design Of Approximation Algorithms by Williamson and Shmoys

Yes, you are correct. It's even possible to make estimation on the number of random iterations required.

You mean to run the randomized algorithm $$$t$$$ times and take the maximum value and try to bound the deviation of this modified algorithm from the mean ?

hari_431

Can someone explain why the probability of success of the chosen point is 1/16 in the 3rd problem of first lecture

jef

At least $$$N/4$$$ points are on the optimal line, so probability of one point on the line is at least $$$N/4$$$ and probability of both points on the line is at least $$$N/16$$$.

pink_roses

2 years ago, # |

Solution with proof for part 1, 5th problem.

Let's find letters one by one. Notice that in worst case you should spend 3 guesses to determine a letter, if first three guesses are wrong then the remaining letter is correct. If you pick ACTG letters randomly, probability that you spend one query is 1/4, two queries 3/4 * 1/3 = 1/4, and three queries is remaining 1/2 probability. Thus expected number of queries for one letter is 2.25, with variance 11/16 = 0.6875. (Computed using well known formulas).

Let q be a random variable — number of queries per determining one letter.

S be sum of queries you spent — sum of all qi.

n be 10000.

Then (S — n * E[q]) / sqrt(n * var(q)) follows standard normal distribution (0 mean, 1 variance).

Probability[ (S — n * E[q]) / sqrt(n * var(q)) < 30 ] is almost one.

Where 30 * sqrt(n * var(q)) + n * E[q] = 24987 ~ 25000

So probability that total number of queries S is less than 25000 if almost one.

Errichto's blog