MCTS智能体在井字游戏中做出错误决策

我已经花了几天时间开发一个基于MCTS的AI。我尝试将其应用于井字游戏,这是我能想到的最简单的游戏,但不知为何,我的AI总是做出错误的决策。我尝试更改UCB1的探索常数、每次搜索的迭代次数,甚至是赢、输和平局的得分(试图让平局更有吸引力,因为这个AI只在第二位出场,目标是尽量打成平局,否则就赢)。目前,代码如下所示:

import randomimport mathimport copyclass tree:    def __init__(self, board):        self.board = board        self.visits = 0        self.score = 0        self.children = []class mcts:    def search(self, mx, player,):        root = tree(mx)        for i in range(1200):            leaf = mcts.expand(self, root.board, player, root)            result = mcts.rollout(self, leaf)            mcts.backpropagate(self, leaf, root, result)        return mcts.best_child(self, root).board    def expand(self, mx, player, root):        plays = mcts.generate_states(self, mx, player) #all possible plays        if root.visits == 0:            for j in plays:                root.children.append(j) #create child_nodes in case they havent been created yet        for j in root.children:            if j.visits == 0:                return j #first iterations of the loop        for j in plays:            if mcts.final(self, j.board, player):                return j        return mcts.best_child(self, root) #choose the one with most potential    def rollout(self, leaf):        mx = leaf.board        aux = 1        while mcts.final(self, mx, "O") != True:            if aux == 1: # "X" playing                possible_states = []                possible_nodes = mcts.generate_states(self, mx, "X")                for i in possible_nodes:                    possible_states.append(i.board)                if len(possible_states) == 1: mx =  possible_states[0]                else:                    choice = random.randrange(0, len(possible_states) - 1)                    mx = possible_states[choice]                if mcts.final(self, mx, "X"): #The play by "X" finished the game                    break            elif aux == 0: # "O" playing                possible_states = []                possible_nodes = mcts.generate_states(self, mx, "O")                for i in possible_nodes:                    possible_states.append(i.board)                if len(possible_states) == 1: mx =  possible_states[0]                else:                    choice = random.randrange(0, len(possible_states) - 1)                    mx = possible_states[choice]            aux += 1            aux = aux%2        if mcts.final(self, mx, "X"):            for i in range(len(mx)):                for k in range(len(mx[i])):                    if mx[i][k] == "-":                        return -1 #loss            return 0 #tie        elif mcts.final(self, mx, "O"):            for i in range(len(mx)):                for k in range(len(mx[i])):                    if mx[i][k] == "-":                        return 1 #win    def backpropagate(self, leaf, root, result): # updating our prospects stats        leaf.score += result        leaf.visits += 1        root.visits += 1    def generate_states(self, mx, player):        possible_states = [] #generate child_nodes        for i in range(len(mx)):            for k in range(len(mx[i])):                if mx[i][k] == "-":                    option = copy.deepcopy(mx)                    option[i][k] = player                    child_node = tree(option)                    possible_states.append(child_node)        return possible_states    def final(self,mx, player): #check if game is won        possible_draw = True        win = False        for i in mx: #lines            if i == [player, player, player]:                win = True                possible_draw = False        if mx[0][0] == player: #diagonals            if mx[1][1] == player:                if mx[2][2] == player:                    win = True                    possible_draw = False        if mx[0][2] == player:            if mx[1][1] == player:                if mx[2][0] == player:                    win = True                    possible_draw = False        for i in range(3): #columns            if mx[0][i] == player and mx[1][i] == player and mx[2][i] == player:                win = True                possible_draw = False        for i in range(3):            for k in range(3):                if mx[i][k] == "-":                    possible_draw = False        if possible_draw:            return possible_draw        return win    def calculate_score(self, score, child_visits, parent_visits, c): #UCB1        return score / child_visits + c * math.sqrt(math.log(parent_visits) / child_visits)    def best_child(self, root): #returns most promising node        treshold = -1*10**6        for j in root.children:            potential = mcts.calculate_score(self, j.score, j.visits, root.visits, 2)            if potential > treshold:                win_choice = j                treshold = potential        return win_choice#todo the AI takes too long for each play, optimize that by finding the optimal approach in the rollout phase

首先,这个AI的目的是返回一个经过修改的矩阵,代表在当前情况下它能做出的最佳决策。我开始怀疑MCTS算法是否是这些错误决策背后的原因,因为可能在实现中存在一些错误。根据我的理解,代码执行了以下步骤:

  1. 检查根节点是否已经有子节点,如果有,则选择最有前景的子节点。
  2. 进行一次随机模拟并保存结果。
  3. 更新叶节点的得分、访问次数以及根节点的访问次数。
  4. 重复进行1200次迭代,在我的例子中是这样设置的。
  5. 返回可能的最佳移动(矩阵,子节点)。

为什么它不起作用?为什么它选择了错误的移动而不是最优的?算法是否错误实现了?


回答:

我的错误在于在扩展阶段选择了访问次数最多的节点,而应该根据UCB1公式选择最有潜力的节点。我在实现一些if语句时也犯了一些错误,导致所有输局都没有被正确计算。

Related Posts

L1-L2正则化的不同系数

我想对网络的权重同时应用L1和L2正则化。然而,我找不…

使用scikit-learn的无监督方法将列表分类成不同组别,有没有办法?

我有一系列实例,每个实例都有一份列表,代表它所遵循的不…

f1_score metric in lightgbm

我想使用自定义指标f1_score来训练一个lgb模型…

通过相关系数矩阵进行特征选择

我在测试不同的算法时,如逻辑回归、高斯朴素贝叶斯、随机…

可以将机器学习库用于流式输入和输出吗?

已关闭。此问题需要更加聚焦。目前不接受回答。 想要改进…

在TensorFlow中,queue.dequeue_up_to()方法的用途是什么?

我对这个方法感到非常困惑,特别是当我发现这个令人费解的…

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注