MCTS智能体在井字游戏中做出错误决策

我已经花了几天时间开发一个基于MCTS的AI。我尝试将其应用于井字游戏，这是我能想到的最简单的游戏，但不知为何，我的AI总是做出错误的决策。我尝试更改UCB1的探索常数、每次搜索的迭代次数，甚至是赢、输和平局的得分（试图让平局更有吸引力，因为这个AI只在第二位出场，目标是尽量打成平局，否则就赢）。目前，代码如下所示：

import randomimport mathimport copyclass tree:    def __init__(self, board):        self.board = board        self.visits = 0        self.score = 0        self.children = []class mcts:    def search(self, mx, player,):        root = tree(mx)        for i in range(1200):            leaf = mcts.expand(self, root.board, player, root)            result = mcts.rollout(self, leaf)            mcts.backpropagate(self, leaf, root, result)        return mcts.best_child(self, root).board    def expand(self, mx, player, root):        plays = mcts.generate_states(self, mx, player) #all possible plays        if root.visits == 0:            for j in plays:                root.children.append(j) #create child_nodes in case they havent been created yet        for j in root.children:            if j.visits == 0:                return j #first iterations of the loop        for j in plays:            if mcts.final(self, j.board, player):                return j        return mcts.best_child(self, root) #choose the one with most potential    def rollout(self, leaf):        mx = leaf.board        aux = 1        while mcts.final(self, mx, "O") != True:            if aux == 1: # "X" playing                possible_states = []                possible_nodes = mcts.generate_states(self, mx, "X")                for i in possible_nodes:                    possible_states.append(i.board)                if len(possible_states) == 1: mx =  possible_states[0]                else:                    choice = random.randrange(0, len(possible_states) - 1)                    mx = possible_states[choice]                if mcts.final(self, mx, "X"): #The play by "X" finished the game                    break            elif aux == 0: # "O" playing                possible_states = []                possible_nodes = mcts.generate_states(self, mx, "O")                for i in possible_nodes:                    possible_states.append(i.board)                if len(possible_states) == 1: mx =  possible_states[0]                else:                    choice = random.randrange(0, len(possible_states) - 1)                    mx = possible_states[choice]            aux += 1            aux = aux%2        if mcts.final(self, mx, "X"):            for i in range(len(mx)):                for k in range(len(mx[i])):                    if mx[i][k] == "-":                        return -1 #loss            return 0 #tie        elif mcts.final(self, mx, "O"):            for i in range(len(mx)):                for k in range(len(mx[i])):                    if mx[i][k] == "-":                        return 1 #win    def backpropagate(self, leaf, root, result): # updating our prospects stats        leaf.score += result        leaf.visits += 1        root.visits += 1    def generate_states(self, mx, player):        possible_states = [] #generate child_nodes        for i in range(len(mx)):            for k in range(len(mx[i])):                if mx[i][k] == "-":                    option = copy.deepcopy(mx)                    option[i][k] = player                    child_node = tree(option)                    possible_states.append(child_node)        return possible_states    def final(self,mx, player): #check if game is won        possible_draw = True        win = False        for i in mx: #lines            if i == [player, player, player]:                win = True                possible_draw = False        if mx[0][0] == player: #diagonals            if mx[1][1] == player:                if mx[2][2] == player:                    win = True                    possible_draw = False        if mx[0][2] == player:            if mx[1][1] == player:                if mx[2][0] == player:                    win = True                    possible_draw = False        for i in range(3): #columns            if mx[0][i] == player and mx[1][i] == player and mx[2][i] == player:                win = True                possible_draw = False        for i in range(3):            for k in range(3):                if mx[i][k] == "-":                    possible_draw = False        if possible_draw:            return possible_draw        return win    def calculate_score(self, score, child_visits, parent_visits, c): #UCB1        return score / child_visits + c * math.sqrt(math.log(parent_visits) / child_visits)    def best_child(self, root): #returns most promising node        treshold = -1*10**6        for j in root.children:            potential = mcts.calculate_score(self, j.score, j.visits, root.visits, 2)            if potential > treshold:                win_choice = j                treshold = potential        return win_choice#todo the AI takes too long for each play, optimize that by finding the optimal approach in the rollout phase

首先，这个AI的目的是返回一个经过修改的矩阵，代表在当前情况下它能做出的最佳决策。我开始怀疑MCTS算法是否是这些错误决策背后的原因，因为可能在实现中存在一些错误。根据我的理解，代码执行了以下步骤：

检查根节点是否已经有子节点，如果有，则选择最有前景的子节点。
进行一次随机模拟并保存结果。
更新叶节点的得分、访问次数以及根节点的访问次数。
重复进行1200次迭代，在我的例子中是这样设置的。
返回可能的最佳移动（矩阵，子节点）。

为什么它不起作用？为什么它选择了错误的移动而不是最优的？算法是否错误实现了？

回答：

我的错误在于在扩展阶段选择了访问次数最多的节点，而应该根据UCB1公式选择最有潜力的节点。我在实现一些if语句时也犯了一些错误，导致所有输局都没有被正确计算。

学技术

MCTS智能体在井字游戏中做出错误决策

发表回复取消回复

相关文章：

使用 LibSVM 计算与均值/标准差对最接近的匹配项

Related Posts

使用LSTM在Python中预测未来值

如何在gensim的word2vec模型中查找双词组的相似性

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

ML Tuning – Cross Validation in Spark

如何在React JS中使用fetch从REST API获取预测

如何分析ML.NET中多类分类预测得分数组？

发表回复 取消回复

发表回复取消回复