anusatya.choudhary's blog

By anusatya.choudhary, history, 3 years ago, In English

Find the expected value of the number of segments in a string of length A in a language having alphabet size B.

A segment is defined as the maximum contiguous substring containing the same character. Eg. In string 10011. The segments are 1, 00 and 11. The number of segments will be 3.

Input format: The first argument is A and second argument is B.

Output format: Return the expected value of the number of segments. This can be represented in the form of x/y. Return x.y^(-1)(mid 10^9 + 7).

Constraints : 1<=A,B<=10^9

Example : A=1,B=2. Output is 1. A=2,B=2. Output is 500000005.

| Write comment?
»
3 years ago, # |
  Vote: I like it +14 Vote: I do not like it

let $$$p_i$$$ be probability of answer being i after some stage.

After some number of stages answer is $$$\sum_{p_i \neq 0}(i * p_i)$$$

Then we add character to the end of the string, there are to types — it's the same as last (probability of that = 1/B), it starts new segment (probability = $$$(B - 1) / B$$$)

So now answer becomes : $$$\sum_{p_i \neq 0}(i * p_i / B + (i + 1) * p_i * (B - 1) / B) $$$

= $$$\sum_{p_i \neq 0}(i * p_i + (i * p_i + p_i) * (B - 1)) / B$$$

= $$$\sum_{p_i \neq 0}(B * i * p_i + B * p_i - p_i) / B$$$

= $$$\sum_{p_i \neq 0}(i * p_i) + \sum_{p_i \neq 0}(p_i) * (B - 1) / B$$$

= "previous answer" $$$+ (B - 1) / B$$$

answer for string length 1 is 1

so final answer is get by adding A — 1 characters, so answer := $$$1 + (A - 1)* (B - 1) / B$$$

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thank you!

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      You can actually avoid most of the math that you did using the linearity of expectation. Using your observation at the beginning (that a new segment starts with probability $$$(B-1)/B$$$), you can notice that your answer increases with probability $$$(B-1)/B$$$, because the number of segments is the number of times a segment starts. This basically means you can skip all of the math until the line "previous answer"+$$$(B-1)/B$$$, reaching the same answer in a much more intuitive way.