HELP!!! A little question about Monte Carlo Tree Search?

#	User	Rating
1	ecnerwala	3649
2	Benq	3581
3	jiangly	3578
4	orzdevinwang	3570
5	Geothermal	3569
5	cnnfls_csy	3569
7	tourist	3565
8	maroonrk	3531
9	Radewoosh	3521
10	Um_nik	3482

#	User	Contrib.
1	maomao90	174
2	awoo	164
3	adamant	161
4	TheScrasse	159
5	nor	158
6	maroonrk	156
7	-is-this-fft-	152
8	SecondThread	146
8	orz	146
10	pajenegod	145

Updated: Thank to mnaeraxr, my old question has found its answer, but a new one's just arrived. I hope you all can continue helping me in understanding this.

Hello everyone,

I am sure that many of you have known this algorithm. As far as I am concerned, it is one of the most popular techniques in AI. I am learning it and I have a lot of questions regarding the intuition behind it, and hence I am writing today, with hope that some of you can share your experience with me.

To summarize a bit, a MCTS algorithm is consisted of 4 phases — Select, Expand, Simulate and Propagate, and is based on a formula called UCB1 to balance between exploitation and exploration.

[NEW QUESTION] My new question is: does MCTS have memory, by which I mean if it does learn to form its experience, save that experience somewhere and retrieve it for query? I asked this because even after understanding how the tree is formed, I find no information for how to query the tree. The only example with clear code and real serious problem I have found (which confused me and made me doubt MCTS) is after analyzing this (the main body) and this (the tree node). In this case, for every time we want to make an action (in the process()), they "create a completely new tree" and only take action after that by considering actions from "only the root" node ("all" other codes in that website use the same idea). In other words, each action is somewhat "independent" of the previous one. Is this the true nature of MCTS? If not (hopefully), could you please give me some references for how I can efficiently use all MCTS past experience and make a query? Thank you so much.

[ANSWER FOUND] My first concern is about the UCB1 formula, that is UCB1 = (total wins / total visits) + C * sqrt(ln(total visits in parent node) / total visits in current node). My question is what will change if I, instead of using "total number of wins", use something like "total reward" (which may be HP loss, energy increase and so on)? In such case, will the term sqrt(ln(total visits in parent node) / total visits in current node) stay the same?

Thank you for your time and consideration. I am looking forward to hearing from you.

Comments (11)

Show archived | Write comment?

duckladydinh

6 years ago, # |

+20

Heyy!! Come on, Codeforcers. I know that this may not be the right place to ask about AI, but Codeforces is full of great programmers who know far beyond the scope of CP :( , can someone please help me? :(

→ Reply

xiaoxiaobaozi

Specific analysis, UCB refers to The Upper Confidence BoundAlgorithm, and the confidence in your formula is measured in (total wins / total visits), because your actual problem is a chess problem, the final result of the chess problem is always judged by win or lose, it doesn't care how many chess you lose. when the actual problem changes, you should combine each parameter that affects the result, then recalculate the formula like multi-armed gambling machines problem.

mnaeraxr

+18

This UCB formula you state before is widely used, but MCTS is not restricted to it. With this formula you are seeking to balance exploitation and exploration while choosing at each step which node to expand.

There is no problem if you change total_wins with some reward objective (actually this is extensively done in the reinforcement learning field to train agents in different environments).

Notice that C factor is sensitive to your problem, it will keep the magnitudes of both terms (exploration and exploitation) close to each other.

You can always change the formula completely, just keeping in mind your goal. When you have little information explore "uniform" and while you are getting more insights explore those transitions that give you more "reward".

One thing you can do is augment this formula using some "intuition". This is mostly done using heuristics,2018-05-24 deep learning. Check how this people modified UCB formula and improve MCTS performance.

6 years ago, # ^ |

+15

Thank you, there is a lot of useful information in your answer.

Hi, mnaeraxr, thank you for your informative answer, but if possible, can you also help me one more time? Do you know how to query the MCTS tree efficiently so that we do not need to reconstruct the tree every time we want to query? Thank you very much.

Klein

Exactly what is the problem? When a move is chosen, you move down from the root to the corresponding child node. If you then want to keep the tree, you just make that child node your new root. It doesn't necessarily mean that the results of the algorithm will improve though (you would need to run some tests to find out), as MCTS-trees can tend to be biased in some direction (can obviously be worked around by e.g. tuning the C parameter).

The problem is I just want to confirm if disposing of the old tree is the standard way, by that I mean it will not affect the performance of the algorithm. As you know, the time allowed between 2 consecutive actions is very little, so if we just dispose the old information, we have very little time to train, in my example above, it is 1/60 of a second, hence I fail to see how it can do well given so little time. And unfortunately, Google does not seem to help much for my question.

So both approaches seem to be employed: https://stackoverflow.com/questions/47389700/why-does-monte-carlo-tree-search-reset-tree

It all boils down to what works best for your domain and constraints. If you only have 1/60 sec between moves, then you probably want to store the tree yes. But with such a low time, you probably want to have a quite exploitative search (i.e. low C) in which case the tree will be biased, which means you might not wanna store it. ^^

But with a mere 1/60 sec, you're probably better off using some more efficient heuristics (running MCTS simulations is quite expensive). Or rather, cleverly incorporating heuristics in the MCTS would be more important than the MCTS itself (whether you store the tree or not between moves).

My (not so exciting) research was focused on MCTS in single player games, in which case there's no element of "forcing a move" and you're free to keep the whole tree from the root during the whole allotted time.

Thanks for your reply. Actually, in the page where I got the example code, it seems that the MCTS of the participants are not very good. They win mostly because of using heuristics and that's why I am very doubtful about this. Anyway, thanks. Do you maybe have any references about this? I mean about the difference between storing and not storing.

Keeping the information from one node to another is up to you. You can drop the part of the tree which became useless and start working from the new root/state.

Notice that the information you will keep is much smaller than the information you are going to generate in the new step, empirically its like 1 / BRANCHING_FACTOR. So if you have big BRANCHING_FACTOR the efforts to keep the precalc subtree become useless.

Regarding quality of the new tree keeping the old tree information, it will never affect you, so you can use it.

I implemented MCTS in a rather simple game you can check it here (without storing the tree).

Auto comment: topic has been updated by duckladydinh (previous revision, new revision, compare).

duckladydinh's blog