MCTS智能体在井字游戏中做出错误决策

我已经花了几天时间开发一个基于MCTS的AI。我尝试将其应用于井字游戏,这是我能想到的最简单的游戏,但不知为何,我的AI总是做出错误的决策。我尝试更改UCB1的探索常数、每次搜索的迭代次数,甚至是赢、输和平局的得分(试图让平局更有吸引力,因为这个AI只在第二位出场,目标是尽量打成平局,否则就赢)。目前,代码如下所示:

import randomimport mathimport copyclass tree:    def __init__(self, board):        self.board = board        self.visits = 0        self.score = 0        self.children = []class mcts:    def search(self, mx, player,):        root = tree(mx)        for i in range(1200):            leaf = mcts.expand(self, root.board, player, root)            result = mcts.rollout(self, leaf)            mcts.backpropagate(self, leaf, root, result)        return mcts.best_child(self, root).board    def expand(self, mx, player, root):        plays = mcts.generate_states(self, mx, player) #all possible plays        if root.visits == 0:            for j in plays:                root.children.append(j) #create child_nodes in case they havent been created yet        for j in root.children:            if j.visits == 0:                return j #first iterations of the loop        for j in plays:            if mcts.final(self, j.board, player):                return j        return mcts.best_child(self, root) #choose the one with most potential    def rollout(self, leaf):        mx = leaf.board        aux = 1        while mcts.final(self, mx, "O") != True:            if aux == 1: # "X" playing                possible_states = []                possible_nodes = mcts.generate_states(self, mx, "X")                for i in possible_nodes:                    possible_states.append(i.board)                if len(possible_states) == 1: mx =  possible_states[0]                else:                    choice = random.randrange(0, len(possible_states) - 1)                    mx = possible_states[choice]                if mcts.final(self, mx, "X"): #The play by "X" finished the game                    break            elif aux == 0: # "O" playing                possible_states = []                possible_nodes = mcts.generate_states(self, mx, "O")                for i in possible_nodes:                    possible_states.append(i.board)                if len(possible_states) == 1: mx =  possible_states[0]                else:                    choice = random.randrange(0, len(possible_states) - 1)                    mx = possible_states[choice]            aux += 1            aux = aux%2        if mcts.final(self, mx, "X"):            for i in range(len(mx)):                for k in range(len(mx[i])):                    if mx[i][k] == "-":                        return -1 #loss            return 0 #tie        elif mcts.final(self, mx, "O"):            for i in range(len(mx)):                for k in range(len(mx[i])):                    if mx[i][k] == "-":                        return 1 #win    def backpropagate(self, leaf, root, result): # updating our prospects stats        leaf.score += result        leaf.visits += 1        root.visits += 1    def generate_states(self, mx, player):        possible_states = [] #generate child_nodes        for i in range(len(mx)):            for k in range(len(mx[i])):                if mx[i][k] == "-":                    option = copy.deepcopy(mx)                    option[i][k] = player                    child_node = tree(option)                    possible_states.append(child_node)        return possible_states    def final(self,mx, player): #check if game is won        possible_draw = True        win = False        for i in mx: #lines            if i == [player, player, player]:                win = True                possible_draw = False        if mx[0][0] == player: #diagonals            if mx[1][1] == player:                if mx[2][2] == player:                    win = True                    possible_draw = False        if mx[0][2] == player:            if mx[1][1] == player:                if mx[2][0] == player:                    win = True                    possible_draw = False        for i in range(3): #columns            if mx[0][i] == player and mx[1][i] == player and mx[2][i] == player:                win = True                possible_draw = False        for i in range(3):            for k in range(3):                if mx[i][k] == "-":                    possible_draw = False        if possible_draw:            return possible_draw        return win    def calculate_score(self, score, child_visits, parent_visits, c): #UCB1        return score / child_visits + c * math.sqrt(math.log(parent_visits) / child_visits)    def best_child(self, root): #returns most promising node        treshold = -1*10**6        for j in root.children:            potential = mcts.calculate_score(self, j.score, j.visits, root.visits, 2)            if potential > treshold:                win_choice = j                treshold = potential        return win_choice#todo the AI takes too long for each play, optimize that by finding the optimal approach in the rollout phase

首先,这个AI的目的是返回一个经过修改的矩阵,代表在当前情况下它能做出的最佳决策。我开始怀疑MCTS算法是否是这些错误决策背后的原因,因为可能在实现中存在一些错误。根据我的理解,代码执行了以下步骤:

  1. 检查根节点是否已经有子节点,如果有,则选择最有前景的子节点。
  2. 进行一次随机模拟并保存结果。
  3. 更新叶节点的得分、访问次数以及根节点的访问次数。
  4. 重复进行1200次迭代,在我的例子中是这样设置的。
  5. 返回可能的最佳移动(矩阵,子节点)。

为什么它不起作用?为什么它选择了错误的移动而不是最优的?算法是否错误实现了?


回答:

我的错误在于在扩展阶段选择了访问次数最多的节点,而应该根据UCB1公式选择最有潜力的节点。我在实现一些if语句时也犯了一些错误,导致所有输局都没有被正确计算。

Related Posts

使用LSTM在Python中预测未来值

这段代码可以预测指定股票的当前日期之前的值,但不能预测…

如何在gensim的word2vec模型中查找双词组的相似性

我有一个word2vec模型,假设我使用的是googl…

dask_xgboost.predict 可以工作但无法显示 – 数据必须是一维的

我试图使用 XGBoost 创建模型。 看起来我成功地…

ML Tuning – Cross Validation in Spark

我在https://spark.apache.org/…

如何在React JS中使用fetch从REST API获取预测

我正在开发一个应用程序,其中Flask REST AP…

如何分析ML.NET中多类分类预测得分数组?

我在ML.NET中创建了一个多类分类项目。该项目可以对…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注