Monte Carlo Tree Search

Hi các bạn, ở bài viết này mình sẽ giới thiệu và giải thích về Monte Carlo Tree Search, một kĩ thuật được sử dụng trong Alpha Go, Alpha Zero.

Monte Carlo Tree Search (MCTS) dùng để dự đoán được lượt di chuyển tốt nhất dựa trên simulation test results.

Ý tưởng chỉnh của MCTS là tìm kiếm (search) giống như các thuật toán khác như Minimax, Alpha-beta Prunning.

Node being not-fully expanded means at least one of its children is unvisited, not explored. Once not fully expanded node is encountered, one of its unvisited children is chosen to become a root node for a single playout/simulation. The result of the simulation is then propagated back up to the current tree root updating game tree nodes statistics. Once the search (constrained by time or computational power) terminates, the move is chosen based on the gathered statistics.

Simulation/Playout

Game tree node expansion – fully expanded and visited nodes

Backpropagation – propagating back the simulation result

Nodes’ statistics

Game tree traversal

Upper Confidence Bound applied to trees

Terminating Monte Carlo Tree Search

Summary

pseudo-code: def monte_carlo_tree_search(root): while resources_left(time, computational power): leaf = traverse(root) # leaf = unvisited node simulation_result = rollout(leaf) backpropagate(leaf, simulation_result) return best_child(root)

def traverse(node): while fully_expanded(node): node = best_uct(node) return pick_univisted(node.children) or node # in case no children are present / node is terminal

def rollout(node): while non_terminal(node): node = rollout_policy(node) return result(node)

def rollout_policy(node): return pick_random(node.children)

def backpropagate(node, result): if is_root(node) return node.stats = update_stats(node, result) backpropagate(node.parent)

def best_child(node): pick child with highest number of visits