Watch my lecture-stream tomorrow (Thursday) at 14:00 CEST — ~~https://www.youtube.com/watch?v=qdlPY37MBPo~~ https://www.youtube.com/watch?v=U_h3IjreRek. I will go through theory and problems from this blog. The only prerequisite is knowing what is probability. The next (harder) part on Monday.

The video will be available later, with timestamps for each problem — so you don't have to watch everything.

### Definition of EV

Let's say we bought a lottery ticket for 2$. We will win 10$ with probability 10%, and 20$ with p-bility 2%. On average, it gives us 0.1·10 + 0.02·20 = 1.4, so we are worse off after buying the ticket. The computed average is called the expected value.

The expected value (EV, expectation) is the average value of an event/experiment. For example, EV of the number of pips rolled on a 6-sided die is 3.5:

Linearity of EV (super important theorem):

*E*(

*X*+

*Y*) =

*E*(

*X*) +

*E*(

*Y*)

### Technique "Contribution to the sum"

If we want to find the sum over many ways/possibilities, we should consider every element (maybe a number, or a pair or an edge) and count how many times it will be added to the answer.

### EV easy problems

**Aces**We choose 10 cards at random from a standard deck of 52 cards. Find EV of the number of aces.**Inflation**The price of a tv is 1000$. Each of the next*N*days, the prices goes up by 5$ or 10$ (each with p-bility 50%). Find EV of the final price.

Bonus: What if it goes up by 1% or 2% each day?**Max of two**You roll a 6-sided die twice. Find EV of the bigger of two scores.**Max of**You roll a 6-sided die*N**N*times. Find EV of the biggest score.

A few possible solutions.**Birthdays**You teach informatics in a class with 20 students. When someone has a birthday, you must let the whole class to play games instead of learning algorithms and using Excel. Maybe some students have birthday on the same day, so there would be fewer than 20 wasted days during a year? Find EV of the number of days when at least one student has birthday.

Bonus/fact: Birthday paradox.**First heads**Find EV of the number of coin tosses until you get heads.

How to check your answer with a program?**Two heads**Find EV of the number of coin tosses until you get heads two times in total.**Two heads in a row**Find EV of the number of coin tosses until you get heads two times in a row.**Volleyball**12 teams, including Poland, play in the volleyball tournament. Teams are divided into 4 groups, each with 3 teams. In each group, every teams plays against every other team, and then two best teams advance to the elimination stage. In case of a perfect tie, two random teams advance. The elimination stage has quarterfinals, halffinals and the final match. In every match, a random of two teams wins (50% each).

Find p-bility that Poland will win the whole tournament. Find EV of the number of matches won by Poland. Find EV of the number of matches won by Poland, assuming that they won the whole tournament (in other words, find EV of the number of matches won by the winner of the whole tournament).

### Problems for "contribution" technique

**Hills**Given a sequence of length*N*(*N*≤ 10^{5}), count triples of indices*i*<*j*<*k*that*a*_{i}<*a*_{j}>*a*_{k}.

Bonus: Count zig-zags of length 10, i.e.*i*_{1}<*i*_{2}< ... <*i*_{10}that*a*[*i*_{1}] <*a*[*i*_{2}] >*a*[*i*_{3}] <*a*[*i*_{4}] > ... <*a*[*i*_{10}].**Paths in tree**Given a tree of length*N*(*N*≤ 10^{5}), find the sum of lengths of all paths. (Solve this without not-trivial dp or centroids.)

Bonus: Find the sum over squares of lengths of paths.**Sum over subsets**There are*N*competitions, the*i*-th with prize*a*_{i}. You're quite good and you will win each competition with p-bility 50%. Find EV of the total won prizes.

Equivalently: Find the sum over total prize over all 2^{N}possibilities, and print answer modulo 10^{9}+ 7. (This answer divided by 2^{N}gives the answer from the first version.).**Math encoder (Code Jam Kickstart 2017 round B)**You are given a sequence of*N*numbers (*N*≤ 2000 or*N*≤ 10^{5}). We're to choose one of 2^{N}- 1 non-empty subsets, uniformly at random. Find EV of the difference between the maximum and minimum element in the subset.

Equivalently: Find the sum over the difference over all subsets, and print the answer modulo 10^{9}+ 7.**Imbalanced array (CF Educational Round 23)**Same as the previous problem, but choose a random of*N*·(*N*+ 1) / 2 intervals.

Bonus: Do it in*O*(*N*) if the input contains a permutation of numbers 1 through*N*.**Randomizer (CF 313 D)**You're given a convex polygon with*N*vertices (*N*≤ 2000 or*N*≤ 10^{5}). We choose a random subset of vertices, what gives us a new (small) convex polygon. Find EV of the perimeter of the new polygon.

Well, the CF problem was a bit harder, with computing area and using Pick's theorem.**Random CH**You're given*N*points (*N*≤ 200 or*N*≤ 2000), no three are collinear. Each point disappears with p-bility 50%. Find EV of the size of the convex hull of remaining points.

(The size of CH is the number of its vertices.).**Eating ends**You're given a sequence of length*N*(*N*≤ 2000 or*N*≤ 10^{5}).*N*- 1 times we will remove the first or the last element, each with p-bility 50%. Find EV of the last remaining number.

Well, the implementation is hard because of precision issues.**Sum-length**Given a sequence of length*N*(*N*≤ 10^{5}), find the sum over*sum*·*len*^{3}over all intervals. Print the answer modulo 10^{9}+ 7.

There are a few possible solutions, including one that we will discuss in the next part.

Is this an easy problem about this topic?

Yes it is.

For

Paths in tree(2) can anyone (with spoiler maybe) explain how to do this with contribution method. For some reason centroid and dp solutions comes to mind but not contribution method.Unless this is what is meant by contribution method in the context of the problem:

My ideacompute sum of paths to subtree as dp1

compute the complement of paths to subtree (everything above) as dp2

ans = sum(dp1 + dp2)/2

check number of nodes on left side of edge

check number of nodes on right side of edge

edge contributes weight of (nodes on left side) * (nodes on right side) to sum

Thank you.

Thanks for such awesome blog.

I tried to work on above problems myself. For problem,

Max of NunderEV easy problem, I am not sure whether my approach is correct or not. So can someone please confirm, whether it's correct or let me know where I am going wrong.SpoilerSay, I want to find EV of biggest score to be 5. So first, finding out number of ways so that 5 is maximum for all N rolls of dice.

Number of ways :-

x1 + x2 + x3 + x4 + x5 = N, where x1 = Number of times 1 occurs, x2 = Number of times 2 occurs and so on. Hence, x1 >= 0, x2 >= 0, ..., x4 >= 0. It's required that x5 >= 1.

Hence, x1 + x2 + x3 + x4 + x5 — 1 = N

x1 + x2 + x3 + x4 + x5 = N + 1.

Total number of ways (by star bar theorem) = (N+5)C(4). Total Probablilty for occurence of all numbers is :- (1/6)^N.

Hence, Expected value that 5 is highest score is E(5) = (N+5)C(4) * (1/6)^N * 5

Overall Expected value for the answer will be E(1) + E(2) + ..... + E(6).

Is it correct approach ?

One may use the basic definition for EV to find out the answer. Consider, what's the probability that the maximum outcome equals to

x? That's not very easy to figure out. But instead, one thing is easy to figure out: the probability that the maximum outcome is less than or equal tox, which is just , let's denote itf(x), then the probability that the maximum outcome equal toxis justf(x) -f(x- 1), which I consider as one of the easiest ways to think of this problem.UPD: The answer is therefore . You can then simplify it as you want.

Thanks a lot for sharing your approach!

very well , thanks!

Start in 15 minutes.

Very much looking forward to it — especially because its one of the new topics I am trying to be good at.:D

Can somebody explain why test examples for this problem is 0.8 and 0.26?

If you have one probability 0.1 and other one 0.2 why it isn't 0.3?

Because the problem asked the probability of Andrey not getting upset, in other words, the probability of Andrey getting exactly one answer. If you ask both of them, the probability of getting one answer is (A) the first gets the answer and the second doesn't, which is 0.1*(1-0.2) = 0.08, or (B) the first doesn't get the answer and the second does, which is (1-0.1)*0.2 = 0.18. Adding them up you get 0.26 chance not getting upset.

Errichto Thanks for the lecture. The first few minutes of the video seems to be missing. Any chance of re-uploading the missing part.

After the stream, Youtube takes some time to dump/process the video and make it visible for later. Every time for a while only the last 2:00:00 are available. The whole thing is accessible now. (so after each stream you will have to wait some time to see the beginning again)

Is ans of Problem 8 in EV problem (which was left as homework) 8?

My working-

Lets call Expected value

R.Base case- We can get HH in first 2 tosses itself, with probability of .

(Not reduced purposefully).

Recurrence- If we dont get HH, then we used 2 tosses. The probability of not getting HH is , so with probability of we will get (2 +

R)Final Expression-Is it correct? :)

Good approach, but you have a mistake. If the first two tosses are TH, then you can't say that you restart the process the EV of the remaining tosses is

R. Instead, after the first T you should restart (after 1 toss in case of "TH").Ouch! Yes, my bad! Thanks for feedback. I will try to get it right this time then :)

Did you solve it finally? Is it R = 5 ?

Yes, I also got R=5. I think its time to request Errichto to reveal final answer now? :)

Answer is 6. I think you added expected tosses to get one H for TH. But it doesn't guarantee two consecutive H.

Ans is 6.

1/2 probability for T => 1/2(1+r)

1/4 probability for HH => 1/4(2)

1/4 probability for HT => 1/4(2+r)

r=1/2(1+r) + 1/4(2) +1/4(2+r)

so r=6

You can assume two different states, the initial state and the state where you already tossed one head. List two equations and solve them. You can find out that they are exactly the states in the AC Automaton. :)

Actually, there's a known formula first proposed by Solov'ev in 1966 for calculating the expected number of tosses needed until a certain pattern occurs in linear time combined with KMP algorithm. There's also a interesting game based on similar situations called Penney's game. I remember encountered one problem concerning Penney's Game in a CMU Contest in Ptzcamp. You can learn more about it if you are interested. :)

Linear time? Don't we need Gaussian elimination there?

Timus 1677. Doesn't seem to be solvable with gaussian elimination. Not straightforward at least.

Let's build KMP automaton. Now let to be expected value of time you need to get from -th state to the last one. Then , here σ is size of alphabet and is transition from

kby letterc. Judging only by this, formula, I can say that it's solvable at least inO(n^{2}) by Gaussian elimination because you have almost none elements above main diagonal...You can also solve it in by binary searching

E_{0}and checking whetherE_{n}turns out lower or greater than 1. It doesn't seem good since answer could be very large. So I'm curious how to solve it in linear time.UPD: I got it! You should consequently express

E_{1},E_{2}, ... throughE_{0}, this will solve the problem inO(nσ) at the end!Nvm. You express

E_{i + 1}with the formula forE_{i}.The thing is that

k-th equation actually allow you to express (k+ 1)-th term. Like, from you can say that , and so on you may keep it like .Here's the formula: Say the pattern is

Awith lengthm, then we say whereA^{(k)}represent the suffix of lengthkofAwhileA_{(k)}represent the prefix of lengthkofA. Then the expected number of tosses until the first occurence of the pattern is just 2(A:A), which can obviously be computed in linear time, using, for example rolling hashes. I found this formula inConcrete Mathematics,Chapter 8, btw.What about non-binary alphabet?

I guess it's just you replace the base by Σ where Σ is the alphabet size and at last multiply by Σ. Worked on a few examples, they all turned out to be right. :D

I think it can be proved using generating functions, as it was done for the binary case in

Concrete Mathematics.Oh, I got another way to get solution. You can rewrite my initial formula as where is prefix function of first

kletters with , and . This will allow you to write that .Thank you. That gave me a new approach to try this- my previous approach got complicated because of TH case :(. Thanks for the resources :)

Problem 8:

Let's

Rbe the EV.Case 1: In first try, I got

T, I wasted1move and the probability of this event is and the total number of flips required isR+1.Case 2: In first try I got

Hand in second try I gotT,So I wasted two moves,now the probability of this event is and the total number of flips required isR+2.Case 3: I got

HH, I am done in2moves with probability .So, Final Expression:

So after solving this we can get R = 6.

Is this correct ??

Yes, it is — https://ideone.com/J3bAiY.

How to find the sum over squares of lengths of paths?

I know only with FFT, but for sure there is some nice solution.

He discussed this in the later part of his stream. The basic summary was, that, for each edge, we count number of path it belongs to. (It might seem unintuitive at first, but take it like this- The approach needs all edge weights one, length of path is nothing but number of edges in it, and we are summing up how many paths this edge belongs to.)

So, he derived a formula that-

For each edge, if there are U nodes down the subtree, answer for that edge will be -

E_{i}=U* (N-U) (Number of nodes in sub-tree *Number of nodes outside it)Summing this for all edges gave the answer. You can root tree at any arbitrary vertex.

Thanks for explanation!

But, I thought about sum over squares of lengths of paths. It is more complicate for me.

Sorry! Misread your question. Too late to delete that comment now... will be more careful. I m also stuck on that bonus one :(

SpoilerThe square of the length of a path is the number of pairs of edges both of which belong to the path. So fix a pair of edges and count the number of paths passing through both of them. The total sum should be computable in O(n).

There is nice direct trick for this, not related with expected values.

First, calculate squared distance from the root to all other vertices. Now perform dfs on the tree and when you enter vertex

uyou should add 1 on its subtree and subtract 1 for all other vertices. When you leave the vertex, you should do vice versa. Thus, at all times you'll be having distance from current vertex to all other vertices and you can easily do thingls like calculate sum or even sum of squares of distances to all other vertices.It works in and only thing it requires is segment tree + Euler tour to make 1-1 correspondence between segments in array and subtrees in tree.

I actually gave problem on that in hackerrank week of code 23, link :)

I think this problem is solvable in

O(nm) if we are required to compute the sum overmth power of lengths of paths. Letdp_{v}denote the answer for subtree rooted atv, so we basically need maintain a set of numbers with the following three operations: 1. merge two sets 2. add 1 to every number in the set 3. query the sum overmth power of the numbers in the set. If we maintain the sum ofith power for the numbers in the set for every and compute the dp values using the formula , it would result in aO(nm^{2}) solution. However, we can maintain theith descending powers power for the numbers in the set for every , and use the simple formula , the dp can be calculated inO(nm) time, and we can precompute Stirling numbers of the second kind inO(nm) and use the formula to recover the answer.You need more than that from the structure. Please specify how exactly are you going to calculate answers?

P.S. I think it won't solve problem from hackerrank

OK. I was just talking about the problem that asks one to find out the sum over

mth power of lengths of paths for a tree. To be specific, we storedp_{v, i}the sum over theith descending power of the length bewtween all vertices in the subtree ofvandv.But what about overtree of vertex? I guess, you're doing some new dfs, keeping it on the stack?..

Yes. You need two dfs, one from top to bottom, and one from bottom to top, just as many tree dp problems require, and what you eventually get for each vertex

vis the sum overmth power of lengths of paths with one end atv. Summing up and dividing by two will yield the answer.For the special case with unweighted edges, we can solve it in

O(n* (log^{2}(n) +log(m))) using centroid decomposition and FFT.I have never managed to code it properly, though.

allllekssssa how to solve it with fft? Thanks.

Use centroid decomposition, encode every child-subtree of the node we are fixing in CD as a polynomial, find the product of this polynomials, the i-th coefficient of the product is the amount of paths that pass through the node we are fixing and have size i.

For

Q1P(getting an ace) = (C(4,1) + C(4,2) + C(4,3) + C(4,4))/C(52,10) Is this correct?You can consider each of the cards individually and calculate their contribution. For eg the 1st one has a probability of 1/13 to be an ace. Similarly, you can carry out the process for 10 cards: (1/13)*10=10/13.

Why is probability of getting an ace by picking 10 random cards at once should equal to P(geeting an ace) by picking one by one cards 10 times?

You are drawing cards from a shuffled deck so the order does not matter while considering 10 cards. You don't have to worry whether the first card is an ace or not. For eg: what is the probability of 2nd card to be an ace when you don't know what the 1st card is: 1/13. Similarly using linearity of EV you can cover all the cases.

I think, that this solution is incorrect, because in your solution and Errichto solution probability of 5 aces in 10 randomly picked cards = $$$(1/13)^5$$$ greater than zero, but should be equal to zero.

UPD.Buuut this is brute force for 5 randomly picked cards, and answer is $$$(1/13) \cdot 5$$$, I don't understand why it works correctlyHow does my solution say what is p-bility of 5 aces?

The solution is to use the linearity of expectation. The expected value of results of 13 events is equal to the sum of expected values of each of them, even if they are not independent.

Am I correctly understand, that you are include all probabilities in single event, then you know only expected value for one event and nothing about probabilities, and successfully get correct answer?

I know a probability that the 7th card is an ace. It's $$$1/13$$$. This means that in $$$1000$$$ games, around $$$1000/13$$$ times (on average) I will get an ace from that card. Same for 1st card, 2nd card, and so on.

Yes, I don't care about p-bilities of having exactly some number of cards. That would lead to more complicated computations.

Ok, I understood, thanks for your answer

My bad — I misunderstood your solution and tried convince dmkozyrev that mistake in it :D

Well, there is another way to find correct answer. Let's mark all aces like a units, all other cards like a zeros.

Now, let's modulate current situation. You have a sequence of 4 units and 48 zeros. You wanna know — EV of quantity of units in first 10th numbers of permutation.

EV = P(x = 1) * 1 + P(x = 2) * 2 + P(x = 3) * 3 + P(x = 4) * 4

So it easy to proof, that P(x) in this case is equal to C(10, x) * C(42, 4-x) / C(52, 4), where C(n, m) = n! / (n-m)! / m!

So... the answers is the same)

How dare you ;p

Yeah, you can solve EV problems two ways usually: by definition like you did, or by linearity.

How to solve the last part of Question 9 Volleyball ? I feel like it is somehow related to Bayes' theorem, but i am unable to calculate the expected wins of the winner.

In 9th question on volleyball, expected number of total matches won by winner = 3 + expected number of matches won by winner in group stage. What is expected number of matches won by winner in group stage? Is it 11/8

https://codeforces.com/problemset/problem/601/B — Also a problem of Contribution technique (after being decomposed into simpler form). Try this out guys.

guys can anyone explain how to solve this question: link:https://www.codechef.com/AMR18ROL/problems/MARTING1

Can someone please explain the solution of

Eating Ends? I can't understand the no. of ways for an element to remain till last.For the 9th question of EV easy problems, is the answer to the problem EV of the number of matches won by the winner of the whole tournament 29/6 ?

There is a video with timestamps to each problem.

Hi. Just like EV easy problems 6-8, how to solve something

flipped(updated) like Calculate Expected Value of "Number of consecutive triples of Heads" in $$$N$$$ coin tosses.My approach, Let $$$R_n$$$ be required result for $$$n$$$ coin tosses. Then, we have that Case 1: if $$$nth$$$ toss is Tails, then we need to add $$$R_{n-1}$$$ to answer.

Case 2: if $$$nth$$$ toss is Heads, then we need to consider two cases again, if $$$n-1th$$$ toss is Tails, we add $$$\frac{ R_{n-2} }{2}$$$ to answer, if $$$n-1th$$$ toss is Heads then again if $$$n-2th$$$ toss is Tails then we add $$$\frac{R_{n-3}}{4}$$$, else we add $$$\frac{1 + R_{n-3}}{4}$$$.

So, $$$R_{n}$$$ = $$$R_{n-1} + \frac{R_{n-2}+R_{n-3}}{2}+ \frac{1}{4}$$$. It doesn't seem right to me. Also, since $$$R_0=R_1=R_2=0$$$, we get $$$R_3=\frac{1}{4}$$$, and $$$R_4=\frac{1}{2}$$$. So, basic examples do seem

~~correct~~incorrect. Can someone verify this and/or give a different, perhaps easier approach. I like to ask for verification, because when I do so many cases, I always feel like I have done something wrong. Thanks a lot in advance.UPD: Obviously incorrect, because $$$R_3=\frac{1}{8}$$$.

UPD2 : I think I found the error, I missed the multiplier of $$$\frac{1}{2}$$$ for $$$nth$$$ toss. So that makes it $$$R_n = \frac{R_{n-1}}{2} + \frac{R_{n-2} + R_{n-3}}{4} + \frac{1}{8}$$$. Now all I need for satisfaction, is to solve this recurrence with $$$R_0=R_1=R_2=0$$$ and get the formula provided by Errichto. ( Substitution of formula in place of $$$R_n$$$, for sufficiently large $$$n$$$ (such that we don't have to worry about max function )

does notsatisfy the recurrence. ) Can someone point out what's wrong with this approach, thanks.The EV of the number of consecutive triples of heads in $$$N$$$ coin tosses is $$$max(0, N - 2) \cdot \frac 18$$$ — each of $$$max(0, N - 2)$$$ consecutive triples of tosses has pbility $$$\frac 18$$$ and we use the linearity of expectation. But your formulas look strange.

Problems 6-8 are about the EV of the number of tosses till something happens, not the EV of the number of heads/whatever.

Yes, I know problems 6-8 are entirely different. Infact I wanted to flip it around. But how did you use linearity again?

Did you break it linearly on the starting positions of the triple? That explains $$$max(0,N-2)$$$. Cool. Thanks a lot.

Yes, you can think about it as the number of ways to choose the starting position of three consecutive tosses. But my definition was also enough — the number of ways to choose three consecutive tosses.

I think I figured out where it falls short, even after considering possibilities for last 3 tosses, we need to consider fourth last toss, so as to not forget counting triples formed from fourth last toss and some of last three except last as ending of the triples.

There is a training, contains 13 problems on probability theory and expected value, 10 of them published on codeforces and can be solved, 3 problems (A,B,C) for beginers can be solved on e-olymp: A. Difficult path, B. A coincidence, C. A full set.

can someone give a problem related to 2.Path in trees

Game on Trees, a nice problem on this topic.

Though I understood the solution better by thinking in way of contribution rather than Linearity of Expectation.

Could someone just go through some/all the bonus parts so that I can have something to check my ideas with?