DP with probabilities and cycles: theory behind solution and similar problems?

Revision en2, by pabloskimg, 2020-11-10 20:45:09

I was trying to solve this problem, which requires computing probabilities of winning/losing in a turn-based two-player game. It's not hard to come up with a recurrent solution that pretty much resembles a DP, except for the fact that the states can repeat down the recurrence, producing cycles. Not knowing how to solve it, I resorted to this comment which describes the expected solution. In short, the idea is to have two arrays lo[x][y][z] and hi[x][y][z] that will maintain the lower and upper bounds respectively of the actual probability dp[x][y][z], and basically by initializing lo[x][y][z] to 0, hi[x][y][z] to 1.0 and then updating the arrays in function of each other in a smart way upon multiple iterations, the values "magically" converge to the actual probabilities with enough accuracy and fast enough given the time constraint. I just confirmed this approach works by getting accepted in the problem. However, I still don't understand why it works. So these are my questions:

1. Why does this algorithm work? Is there a broader theory that explains it or is this just an adhoc solution?
2. Why does this converge fast enough?
3. Are there similar problems out there to practice this approach?

Thank you very much.

#### History

Revisions

Rev. Lang. By When Δ Comment
en2 pabloskimg 2020-11-10 20:45:09 9 Tiny change: 'ach works actually by gettin' -> 'ach works by gettin'
en1 pabloskimg 2020-11-10 20:43:21 1447 Initial revision (published)