Errichto's blog

By Errichto, 12 months ago, , Watch my lecture-stream tomorrow (Thursday) at 14:00 CESThttps://www.youtube.com/watch?v=qdlPY37MBPo https://www.youtube.com/watch?v=U_h3IjreRek. I will go through theory and problems from this blog. The only prerequisite is knowing what is probability. The next (harder) part on Monday.

The video will be available later, with timestamps for each problem — so you don't have to watch everything.

Let's say we bought a lottery ticket for 2$. We will win 10$ with probability 10%, and 20$with p-bility 2%. On average, it gives us 0.1·10 + 0.02·20 = 1.4, so we are worse off after buying the ticket. The computed average is called the expected value. The expected value (EV, expectation) is the average value of an event/experiment. For example, EV of the number of pips rolled on a 6-sided die is 3.5: Linearity of EV (super important theorem): E(X + Y) = E(X) + E(Y) Technique "Contribution to the sum" If we want to find the sum over many ways/possibilities, we should consider every element (maybe a number, or a pair or an edge) and count how many times it will be added to the answer. EV easy problems 1. Aces We choose 10 cards at random from a standard deck of 52 cards. Find EV of the number of aces. 2. Inflation The price of a tv is 1000$. Each of the next N days, the prices goes up by 5$or 10$ (each with p-bility 50%). Find EV of the final price.
Bonus: What if it goes up by 1% or 2% each day?

3. Max of two You roll a 6-sided die twice. Find EV of the bigger of two scores.

4. Max of N You roll a 6-sided die N times. Find EV of the biggest score.
A few possible solutions.

5. Birthdays You teach informatics in a class with 20 students. When someone has a birthday, you must let the whole class to play games instead of learning algorithms and using Excel. Maybe some students have birthday on the same day, so there would be fewer than 20 wasted days during a year? Find EV of the number of days when at least one student has birthday.

6. First heads Find EV of the number of coin tosses until you get heads.
How to check your answer with a program?

7. Two heads Find EV of the number of coin tosses until you get heads two times in total.

8. Two heads in a row Find EV of the number of coin tosses until you get heads two times in a row.

9. Volleyball 12 teams, including Poland, play in the volleyball tournament. Teams are divided into 4 groups, each with 3 teams. In each group, every teams plays against every other team, and then two best teams advance to the elimination stage. In case of a perfect tie, two random teams advance. The elimination stage has quarterfinals, halffinals and the final match. In every match, a random of two teams wins (50% each).
Find p-bility that Poland will win the whole tournament. Find EV of the number of matches won by Poland. Find EV of the number of matches won by Poland, assuming that they won the whole tournament (in other words, find EV of the number of matches won by the winner of the whole tournament).

Problems for "contribution" technique

1. Hills Given a sequence of length N (N ≤ 105), count triples of indices i < j < k that ai < aj > ak.
Bonus: Count zig-zags of length 10, i.e. i1 < i2 < ... < i10 that a[i1] < a[i2] > a[i3] < a[i4] > ... < a[i10].

2. Paths in tree Given a tree of length N (N ≤ 105), find the sum of lengths of all paths. (Solve this without not-trivial dp or centroids.)
Bonus: Find the sum over squares of lengths of paths.

3. Sum over subsets There are N competitions, the i-th with prize ai. You're quite good and you will win each competition with p-bility 50%. Find EV of the total won prizes.
Equivalently: Find the sum over total prize over all 2N possibilities, and print answer modulo 109 + 7. (This answer divided by 2N gives the answer from the first version.).

4. Math encoder (Code Jam Kickstart 2017 round B) You are given a sequence of N numbers (N ≤ 2000 or N ≤ 105). We're to choose one of 2N - 1 non-empty subsets, uniformly at random. Find EV of the difference between the maximum and minimum element in the subset.
Equivalently: Find the sum over the difference over all subsets, and print the answer modulo 109 + 7.

5. Imbalanced array (CF Educational Round 23) Same as the previous problem, but choose a random of N·(N + 1) / 2 intervals.
Bonus: Do it in O(N) if the input contains a permutation of numbers 1 through N.

6. Randomizer (CF 313 D) You're given a convex polygon with N vertices (N ≤ 2000 or N ≤ 105). We choose a random subset of vertices, what gives us a new (small) convex polygon. Find EV of the perimeter of the new polygon.
Well, the CF problem was a bit harder, with computing area and using Pick's theorem.

7. Random CH You're given N points (N ≤ 200 or N ≤ 2000), no three are collinear. Each point disappears with p-bility 50%. Find EV of the size of the convex hull of remaining points.
(The size of CH is the number of its vertices.).

8. Eating ends You're given a sequence of length N (N ≤ 2000 or N ≤ 105). N - 1 times we will remove the first or the last element, each with p-bility 50%. Find EV of the last remaining number.
Well, the implementation is hard because of precision issues.

9. Sum-length Given a sequence of length N (N ≤ 105), find the sum over sum·len3 over all intervals. Print the answer modulo 109 + 7.
There are a few possible solutions, including one that we will discuss in the next part. ev, sum, sums, Comments (75)
 » 12 months ago, # | ← Rev. 2 →   Is this an easy problem about this topic?
•  » » Yes it is.
 » 12 months ago, # | ← Rev. 2 →   For Paths in tree (2) can anyone (with spoiler maybe) explain how to do this with contribution method. For some reason centroid and dp solutions comes to mind but not contribution method.Unless this is what is meant by contribution method in the context of the problem: My ideacompute sum of paths to subtree as dp1compute the complement of paths to subtree (everything above) as dp2ans = sum(dp1 + dp2)/2
•  » » check number of nodes on left side of edgecheck number of nodes on right side of edgeedge contributes weight of (nodes on left side) * (nodes on right side) to sum
 » Thank you.
 » Thanks for such awesome blog. I tried to work on above problems myself. For problem, Max of N under EV easy problem, I am not sure whether my approach is correct or not. So can someone please confirm, whether it's correct or let me know where I am going wrong. SpoilerSay, I want to find EV of biggest score to be 5. So first, finding out number of ways so that 5 is maximum for all N rolls of dice.Number of ways :- x1 + x2 + x3 + x4 + x5 = N, where x1 = Number of times 1 occurs, x2 = Number of times 2 occurs and so on. Hence, x1 >= 0, x2 >= 0, ..., x4 >= 0. It's required that x5 >= 1.Hence, x1 + x2 + x3 + x4 + x5 — 1 = N x1 + x2 + x3 + x4 + x5 = N + 1. Total number of ways (by star bar theorem) = (N+5)C(4). Total Probablilty for occurence of all numbers is :- (1/6)^N.Hence, Expected value that 5 is highest score is E(5) = (N+5)C(4) * (1/6)^N * 5 Overall Expected value for the answer will be E(1) + E(2) + ..... + E(6).Is it correct approach ?
•  » » 12 months ago, # ^ | ← Rev. 7 →   One may use the basic definition for EV to find out the answer. Consider, what's the probability that the maximum outcome equals to x? That's not very easy to figure out. But instead, one thing is easy to figure out: the probability that the maximum outcome is less than or equal to x, which is just , let's denote it f(x), then the probability that the maximum outcome equal to x is just f(x) - f(x - 1), which I consider as one of the easiest ways to think of this problem.UPD: The answer is therefore . You can then simplify it as you want.
•  » » » Thanks a lot for sharing your approach!
•  » » » very well , thanks!
 » Start in 15 minutes.
•  » » Very much looking forward to it — especially because its one of the new topics I am trying to be good at.:D
 » 12 months ago, # | ← Rev. 2 →   Can somebody explain why test examples for this problem is 0.8 and 0.26?If you have one probability 0.1 and other one 0.2 why it isn't 0.3?
•  » » Because the problem asked the probability of Andrey not getting upset, in other words, the probability of Andrey getting exactly one answer. If you ask both of them, the probability of getting one answer is (A) the first gets the answer and the second doesn't, which is 0.1*(1-0.2) = 0.08, or (B) the first doesn't get the answer and the second does, which is (1-0.1)*0.2 = 0.18. Adding them up you get 0.26 chance not getting upset.
 » 12 months ago, # | ← Rev. 2 →   Errichto Thanks for the lecture. The first few minutes of the video seems to be missing. Any chance of re-uploading the missing part.
•  » » After the stream, Youtube takes some time to dump/process the video and make it visible for later. Every time for a while only the last 2:00:00 are available. The whole thing is accessible now. (so after each stream you will have to wait some time to see the beginning again)
 » Is ans of Problem 8 in EV problem (which was left as homework) 8? My working-Lets call Expected value R. Base case- We can get HH in first 2 tosses itself, with probability of . (Not reduced purposefully).Recurrence- If we dont get HH, then we used 2 tosses. The probability of not getting HH is , so with probability of we will get (2 + R)Final Expression-    Is it correct? :)
•  » » Good approach, but you have a mistake. If the first two tosses are TH, then you can't say that you restart the process the EV of the remaining tosses is R. Instead, after the first T you should restart (after 1 toss in case of "TH").
•  » » » Ouch! Yes, my bad! Thanks for feedback. I will try to get it right this time then :)
•  » » » » Did you solve it finally? Is it R = 5 ?
•  » » » » » Yes, I also got R=5. I think its time to request Errichto to reveal final answer now? :)
•  » » » » » » Answer is 6. I think you added expected tosses to get one H for TH. But it doesn't guarantee two consecutive H.
•  » » » » » » Ans is 6.1/2 probability for T => 1/2(1+r)1/4 probability for HH => 1/4(2)1/4 probability for HT => 1/4(2+r)r=1/2(1+r) + 1/4(2) +1/4(2+r)so r=6
•  » » 12 months ago, # ^ | ← Rev. 2 →   You can assume two different states, the initial state and the state where you already tossed one head. List two equations and solve them. You can find out that they are exactly the states in the AC Automaton. :)Actually, there's a known formula first proposed by Solov'ev in 1966 for calculating the expected number of tosses needed until a certain pattern occurs in linear time combined with KMP algorithm. There's also a interesting game based on similar situations called Penney's game. I remember encountered one problem concerning Penney's Game in a CMU Contest in Ptzcamp. You can learn more about it if you are interested. :)
•  » » » Linear time? Don't we need Gaussian elimination there?
•  » » » » 12 months ago, # ^ | ← Rev. 2 →   Timus 1677. Doesn't seem to be solvable with gaussian elimination. Not straightforward at least.Let's build KMP automaton. Now let to be expected value of time you need to get from -th state to the last one. Then , here σ is size of alphabet and is transition from k by letter c. Judging only by this, formula, I can say that it's solvable at least in O(n2) by Gaussian elimination because you have almost none elements above main diagonal... You can also solve it in by binary searching E0 and checking whether En turns out lower or greater than 1. It doesn't seem good since answer could be very large. So I'm curious how to solve it in linear time.UPD: I got it! You should consequently express E1, E2, ... through E0, this will solve the problem in O(nσ) at the end!
•  » » » » » 12 months ago, # ^ | ← Rev. 2 →   Nvm. You express Ei + 1 with the formula for Ei.
•  » » » » » » The thing is that k-th equation actually allow you to express (k + 1)-th term. Like, from you can say that , and so on you may keep it like .
•  » » » » 12 months ago, # ^ | ← Rev. 3 →   Here's the formula: Say the pattern is A with length m, then we say where A(k) represent the suffix of length k of A while A(k) represent the prefix of length k of A. Then the expected number of tosses until the first occurence of the pattern is just 2(A: A), which can obviously be computed in linear time, using, for example rolling hashes. I found this formula in Concrete Mathematics,Chapter 8, btw.
•  » » » » » What about non-binary alphabet?
•  » » » » » » I guess it's just you replace the base by Σ where Σ is the alphabet size and at last multiply by Σ. Worked on a few examples, they all turned out to be right. :D I think it can be proved using generating functions, as it was done for the binary case in Concrete Mathematics.
•  » » » » » » » Oh, I got another way to get solution. You can rewrite my initial formula as where is prefix function of first k letters with , and . This will allow you to write that .
•  » » » Thank you. That gave me a new approach to try this- my previous approach got complicated because of TH case :(. Thanks for the resources :)
•  » » 12 months ago, # ^ | ← Rev. 2 →   Problem 8:Let's R be the EV.Case 1: In first try, I got T, I wasted 1 move and the probability of this event is and the total number of flips required is R+1.Case 2: In first try I got H and in second try I got T,So I wasted two moves,now the probability of this event is and the total number of flips required is R+2.Case 3: I got HH, I am done in 2 moves with probability .So, Final Expression: So after solving this we can get R = 6.Is this correct ??
•  » » » Yes, it is — https://ideone.com/J3bAiY.
 » 12 months ago, # | ← Rev. 2 →   How to find the sum over squares of lengths of paths? I know only with FFT, but for sure there is some nice solution.
•  » » 12 months ago, # ^ | ← Rev. 2 →   He discussed this in the later part of his stream. The basic summary was, that, for each edge, we count number of path it belongs to. (It might seem unintuitive at first, but take it like this- The approach needs all edge weights one, length of path is nothing but number of edges in it, and we are summing up how many paths this edge belongs to.)So, he derived a formula that-For each edge, if there are U nodes down the subtree, answer for that edge will be -Ei = U * (N - U) (Number of nodes in sub-tree *Number of nodes outside it)Summing this for all edges gave the answer. You can root tree at any arbitrary vertex.
•  » » » Thanks for explanation!But, I thought about sum over squares of lengths of paths. It is more complicate for me.
•  » » » » 12 months ago, # ^ | ← Rev. 2 →   Sorry! Misread your question. Too late to delete that comment now... will be more careful. I m also stuck on that bonus one :(
•  » » SpoilerThe square of the length of a path is the number of pairs of edges both of which belong to the path. So fix a pair of edges and count the number of paths passing through both of them. The total sum should be computable in O(n).
•  » » 12 months ago, # ^ | ← Rev. 2 →   There is nice direct trick for this, not related with expected values. First, calculate squared distance from the root to all other vertices. Now perform dfs on the tree and when you enter vertex u you should add 1 on its subtree and subtract 1 for all other vertices. When you leave the vertex, you should do vice versa. Thus, at all times you'll be having distance from current vertex to all other vertices and you can easily do thingls like calculate sum or even sum of squares of distances to all other vertices. It works in and only thing it requires is segment tree + Euler tour to make 1-1 correspondence between segments in array and subtrees in tree.I actually gave problem on that in hackerrank week of code 23, link :)
•  » » » 12 months ago, # ^ | ← Rev. 4 →   I think this problem is solvable in O(nm) if we are required to compute the sum over mth power of lengths of paths. Let dpv denote the answer for subtree rooted at v, so we basically need maintain a set of numbers with the following three operations: 1. merge two sets 2. add 1 to every number in the set 3. query the sum over mth power of the numbers in the set. If we maintain the sum of ith power for the numbers in the set for every and compute the dp values using the formula , it would result in a O(nm2) solution. However, we can maintain the ith descending powers power for the numbers in the set for every , and use the simple formula , the dp can be calculated in O(nm) time, and we can precompute Stirling numbers of the second kind in O(nm) and use the formula to recover the answer.
•  » » » » You need more than that from the structure. Please specify how exactly are you going to calculate answers?P.S. I think it won't solve problem from hackerrank
•  » » » » » OK. I was just talking about the problem that asks one to find out the sum over mth power of lengths of paths for a tree. To be specific, we store dpv, i the sum over the ith descending power of the length bewtween all vertices in the subtree of v and v.
•  » » » » » » But what about overtree of vertex? I guess, you're doing some new dfs, keeping it on the stack?..
•  » » » » » » » Yes. You need two dfs, one from top to bottom, and one from bottom to top, just as many tree dp problems require, and what you eventually get for each vertex v is the sum over mth power of lengths of paths with one end at v. Summing up and dividing by two will yield the answer.
•  » » » » For the special case with unweighted edges, we can solve it in O(n * (log2(n) + log(m))) using centroid decomposition and FFT.I have never managed to code it properly, though.
•  » » allllekssssa how to solve it with fft? Thanks.
•  » » » 5 months ago, # ^ | ← Rev. 3 →   Use centroid decomposition, encode every child-subtree of the node we are fixing in CD as a polynomial, find the product of this polynomials, the i-th coefficient of the product is the amount of paths that pass through the node we are fixing and have size i.
 » For Q1 P(getting an ace) = (C(4,1) + C(4,2) + C(4,3) + C(4,4))/C(52,10) Is this correct?
•  » » You can consider each of the cards individually and calculate their contribution. For eg the 1st one has a probability of 1/13 to be an ace. Similarly, you can carry out the process for 10 cards: (1/13)*10=10/13.
•  » » » Why is probability of getting an ace by picking 10 random cards at once should equal to P(geeting an ace) by picking one by one cards 10 times?
•  » » » » You are drawing cards from a shuffled deck so the order does not matter while considering 10 cards. You don't have to worry whether the first card is an ace or not. For eg: what is the probability of 2nd card to be an ace when you don't know what the 1st card is: 1/13. Similarly using linearity of EV you can cover all the cases.
•  » » » » » 4 months ago, # ^ | ← Rev. 7 →   I think, that this solution is incorrect, because in your solution and Errichto solution probability of 5 aces in 10 randomly picked cards = $(1/13)^5$ greater than zero, but should be equal to zero.UPD. Buuut this is brute force for 5 randomly picked cards, and answer is $(1/13) \cdot 5$, I don't understand why it works correctly
•  » » » » » » 4 months ago, # ^ | ← Rev. 2 →   How does my solution say what is p-bility of 5 aces?The solution is to use the linearity of expectation. The expected value of results of 13 events is equal to the sum of expected values of each of them, even if they are not independent.
•  » » » » » » » Am I correctly understand, that you are include all probabilities in single event, then you know only expected value for one event and nothing about probabilities, and successfully get correct answer?
•  » » » » » » » » I know a probability that the 7th card is an ace. It's $1/13$. This means that in $1000$ games, around $1000/13$ times (on average) I will get an ace from that card. Same for 1st card, 2nd card, and so on.Yes, I don't care about p-bilities of having exactly some number of cards. That would lead to more complicated computations.
•  » » » » » » » » » Ok, I understood, thanks for your answer
•  » » » » » » » 4 months ago, # ^ | ← Rev. 2 →   My bad — I misunderstood your solution and tried convince dmkozyrev that mistake in it :DWell, there is another way to find correct answer. Let's mark all aces like a units, all other cards like a zeros.Now, let's modulate current situation. You have a sequence of 4 units and 48 zeros. You wanna know — EV of quantity of units in first 10th numbers of permutation.EV = P(x = 1) * 1 + P(x = 2) * 2 + P(x = 3) * 3 + P(x = 4) * 4So it easy to proof, that P(x) in this case is equal to C(10, x) * C(42, 4-x) / C(52, 4), where C(n, m) = n! / (n-m)! / m!So... the answers is the same)
•  » » » » » » » » How dare you ;pYeah, you can solve EV problems two ways usually: by definition like you did, or by linearity.
 » How to solve the last part of Question 9 Volleyball ? I feel like it is somehow related to Bayes' theorem, but i am unable to calculate the expected wins of the winner.
 » In 9th question on volleyball, expected number of total matches won by winner = 3 + expected number of matches won by winner in group stage. What is expected number of matches won by winner in group stage? Is it 11/8
 » https://codeforces.com/problemset/problem/601/B — Also a problem of Contribution technique (after being decomposed into simpler form). Try this out guys.
 » guys can anyone explain how to solve this question: link:https://www.codechef.com/AMR18ROL/problems/MARTING1
 » Can someone please explain the solution of Eating Ends? I can't understand the no. of ways for an element to remain till last.
 » For the 9th question of EV easy problems, is the answer to the problem EV of the number of matches won by the winner of the whole tournament 29/6 ?
•  » » There is a video with timestamps to each problem.
 » 5 months ago, # | ← Rev. 9 →   Hi. Just like EV easy problems 6-8, how to solve something flipped (updated) like Calculate Expected Value of "Number of consecutive triples of Heads" in $N$ coin tosses.My approach, Let $R_n$ be required result for $n$ coin tosses. Then, we have that Case 1: if $nth$ toss is Tails, then we need to add $R_{n-1}$ to answer.Case 2: if $nth$ toss is Heads, then we need to consider two cases again, if $n-1th$ toss is Tails, we add $\frac{ R_{n-2} }{2}$ to answer, if $n-1th$ toss is Heads then again if $n-2th$ toss is Tails then we add $\frac{R_{n-3}}{4}$, else we add $\frac{1 + R_{n-3}}{4}$.So, $R_{n}$ = $R_{n-1} + \frac{R_{n-2}+R_{n-3}}{2}+ \frac{1}{4}$. It doesn't seem right to me. Also, since $R_0=R_1=R_2=0$, we get $R_3=\frac{1}{4}$, and $R_4=\frac{1}{2}$. So, basic examples do seem correct incorrect. Can someone verify this and/or give a different, perhaps easier approach. I like to ask for verification, because when I do so many cases, I always feel like I have done something wrong. Thanks a lot in advance.UPD: Obviously incorrect, because $R_3=\frac{1}{8}$.UPD2 : I think I found the error, I missed the multiplier of $\frac{1}{2}$ for $nth$ toss. So that makes it $R_n = \frac{R_{n-1}}{2} + \frac{R_{n-2} + R_{n-3}}{4} + \frac{1}{8}$. Now all I need for satisfaction, is to solve this recurrence with $R_0=R_1=R_2=0$ and get the formula provided by Errichto. ( Substitution of formula in place of $R_n$, for sufficiently large $n$ (such that we don't have to worry about max function ) does not satisfy the recurrence. ) Can someone point out what's wrong with this approach, thanks.
•  » » 5 months ago, # ^ | ← Rev. 2 →   The EV of the number of consecutive triples of heads in $N$ coin tosses is $max(0, N - 2) \cdot \frac 18$ — each of $max(0, N - 2)$ consecutive triples of tosses has pbility $\frac 18$ and we use the linearity of expectation. But your formulas look strange.Problems 6-8 are about the EV of the number of tosses till something happens, not the EV of the number of heads/whatever.
•  » » » Yes, I know problems 6-8 are entirely different. Infact I wanted to flip it around. But how did you use linearity again? Did you break it linearly on the starting positions of the triple? That explains $max(0,N-2)$. Cool. Thanks a lot.
•  » » » » Yes, you can think about it as the number of ways to choose the starting position of three consecutive tosses. But my definition was also enough — the number of ways to choose three consecutive tosses.
•  » » I think I figured out where it falls short, even after considering possibilities for last 3 tosses, we need to consider fourth last toss, so as to not forget counting triples formed from fourth last toss and some of last three except last as ending of the triples.
 » There is a training, contains 13 problems on probability theory and expected value, 10 of them published on codeforces and can be solved, 3 problems (A,B,C) for beginers can be solved on e-olymp: A. Difficult path, B. A coincidence, C. A full set.
 » can someone give a problem related to 2.Path in trees
 » Game on Trees, a nice problem on this topic.Though I understood the solution better by thinking in way of contribution rather than Linearity of Expectation.
 » Could someone just go through some/all the bonus parts so that I can have something to check my ideas with?