HELP!!! A little question about Monte Carlo Tree Search?

Revision en1, by duckladydinh, 2018-05-23 10:20:30

Hello everyone,

I am sure that many of you have known this algorithm. As far as I am concerned, it is one of the most popular techniques in AI. I am learning it and I have a lot of questions regarding the intuition behind it, and hence I am writing today, with hope that some of you can share your experience with me.

To summarize a bit, a MCTS algorithm is consisted of 4 phases — Select, Expand, Simulate and Propagate, and is based on a formula called UCB1 to balance between exploitation and exploration. My first concern is about the UCB1 formula, that is UCB1 = (total wins / total visits) + C * sqrt(ln(total visits in parent node) / total visits in current node). My question is what will change if I, instead of using "total number of wins", use something like "total reward" (which may be HP loss, energy increase and so on)? In such case, will the term sqrt(ln(total visits in parent node) / total visits in current node) stay the same?

Thank you for your time and consideration. I am looking forward to hearing from you.

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en2 English duckladydinh 2018-05-28 03:49:51 1316
en1 English duckladydinh 2018-05-23 10:20:30 1107 Initial revision (published)