Google Code Jam 2015 Round 1C Problem B

#	User	Rating
1	ecnerwala	3648
2	Benq	3580
3	orzdevinwang	3570
4	cnnfls_csy	3569
5	Geothermal	3568
6	tourist	3565
7	maroonrk	3530
8	Radewoosh	3520
9	Um_nik	3481
10	jiangly	3467

#	User	Contrib.
1	maomao90	174
2	adamant	164
2	awoo	164
4	TheScrasse	160
5	nor	159
6	maroonrk	156
7	-is-this-fft-	150
7	SecondThread	150
9	orz	146
10	pajenegod	145

After reading the analysis of this problem, I once again think that my concept in probability / excepted value is so bad that I should kill myself.

Here is the link of the problem: Typewriter Monkey

Anyway my problem is about the part of calculating the Expected Number of Bananas (# Occurrence of pattern in the string)

Originally, my thought is the most basic one about Expected Value (which I think it must be correct?):

E(X) = P(Exact 1 banana)*1 + P(Exact 2 bananas)*2 + ...

Then I stop right here and think that, it is too hard for me to find the probability...I cannot count it directly as it will double count a lot, I know no good way to use Inclusion-Exclusion Principle here either...I have no idea.

Then I read the solution and it said:

By linearity of expectation, the expected number of copies is then just P multiplied by the number of places the string can occur, which is S-L+1.

where P is the probability of occurrence of the pattern at a fixed point

Now P is easy to calculate as it does not need to consider any repeating / overlapped cases.

However I cannot understand that why simply P*(S-L+1) would be the answer?

Precisely, how can I deduce that E(X) = P(Exact 1 banana)*1 + P(Exact 2 bananas)*2 + ... = P*(S-L+1)

PS: If anyone can provide any source / tutorial for probability (Programming Contest Oriented is better!), I would be very appreciate that I will send a million "Thank You" messages, separately, to your inbox :)

Comments (5)

Write comment?

Swistakk

9 years ago, # |

Linearity of expectation is a keyphrase here! E(X + Y) = EX + EY ! If you want to count expected sum of results of rolling two dices you can sum expected result of rolling first dice and expected result of rolling second dice, it's that simple! You don't have to count probabilities of obtaining a specific sum, that will be much more involved and unnecessary.

→ Reply

kirakira

9 years ago, # ^ |

I knew this property very briefly, still, I cannot figure out why

E(X) = P*(S-L+1)

Does that mean somehow

E(X) = E(Z1 + Z2 + Z3 + ...) = E(Z1)+E(Z2)+... = P*(S-L+1)

Where Zn is the random variable for Pattern starts at possible position 1, 2 ... n

There are (S-L+1) of them with each of them having probability P??

As I have no / weak concept of random variable, I could not tell how to come up with that conclusion E(X) = P*(S-L+1) step by step... :(

That is exactly the correct argument. You don't have to have a good grasp of random variable concept here. Very often if we want to count sth people react "OK, I will try this counting" and when we want to compute exactly the same thing, but in the end divide by number of possible events which will result in a probability people will yell "Oh no! That cursed probability, I will never learn that!"

Think about it in terms of "Consider all keyboard_sizeⁿ possible sequences of keystrokes and count how many times we will have occurence of our pattern with a fixed starting position". That is exactly what you need to solve this question if probability is not that intuitive for you while problem formulated as above can be given to primary school children.

ahmad_mamdouh

Check TopCoder tutorial for probabilities and CodeChef tutorial about expectations.

Thanks so much!

After reading the CodeChef tutorial, I have following understanding (which I tell myself it is correct)

Let Z be the total # of occurrence, k_1, K_2....k_n be the random variable that be the number of pattern appears at position n

k_i is 1 with probability P, 0 otherwise

Also Z = k_1 + k_2 + ..... + k_n

So E(Z) = E(k_1 + k_2 + ... k_n)

By linearity, E(Z) = E(k_1) + ... + E(k_n) = P * (S-L+1)

And the power of linearity is shown here, as actually k_i are not independent events, but linearity still works

(By wikipedia: Note that the second result is valid even if X is not statistically independent of Y.)

I do not know is this correct way of thinking, but at least I think it gives me a correct direction, thanks a lot :)

kirakira's blog