Please subscribe to the official Codeforces channel in Telegram via the link https://t.me/codeforces_official. ×

HELP!!! A little question about Monte Carlo Tree Search?

Revision en2, by duckladydinh, 2018-05-28 03:49:51

Updated: Thank to mnaeraxr, my old question has found its answer, but a new one's just arrived. I hope you all can continue helping me in understanding this.

Hello everyone,

I am sure that many of you have known this algorithm. As far as I am concerned, it is one of the most popular techniques in AI. I am learning it and I have a lot of questions regarding the intuition behind it, and hence I am writing today, with hope that some of you can share your experience with me.

To summarize a bit, a MCTS algorithm is consisted of 4 phases — Select, Expand, Simulate and Propagate, and is based on a formula called UCB1 to balance between exploitation and exploration.

[NEW QUESTION] My new question is: does MCTS have memory, by which I mean if it does learn to form its experience, save that experience somewhere and retrieve it for query? I asked this because even after understanding how the tree is formed, I find no information for how to query the tree. The only example with clear code and real serious problem I have found (which confused me and made me doubt MCTS) is after analyzing this (the main body) and this (the tree node). In this case, for every time we want to make an action (in the process()), they "create a completely new tree" and only take action after that by considering actions from "only the root" node ("all" other codes in that website use the same idea). In other words, each action is somewhat "independent" of the previous one. Is this the true nature of MCTS? If not (hopefully), could you please give me some references for how I can efficiently use all MCTS past experience and make a query? Thank you so much.

[ANSWER FOUND] My first concern is about the UCB1 formula, that is UCB1 = (total wins / total visits) + C * sqrt(ln(total visits in parent node) / total visits in current node). My question is what will change if I, instead of using "total number of wins", use something like "total reward" (which may be HP loss, energy increase and so on)? In such case, will the term sqrt(ln(total visits in parent node) / total visits in current node) stay the same?

Thank you for your time and consideration. I am looking forward to hearing from you.

#### History

Revisions

Rev. Lang. By When Δ Comment